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This article discusses the usage of a partiton based Fubini calculus for Poisson processes. The approach is an 
amplification of Bayesian techniques developed in Lo and Weng for gamma/Dirichlet processes. Applications to 
models are considered which all fall within an inhomogeneous spatial extension of the size biased framework used 
in Perman, Pitman and Yor. Among some of the results; an explicit partition based calculus is then developed 
for such models, which also includes a series of important exponential change of measure formula. These results 
are then applied to solve the mostly unknown calculus for spatial Levy-Cox moving average models. The analysis 
then proceeds to exploit a structural feature of a scaling operation which arises in Brownian excursion theory. 
From this a series of new mixture representations and posterior characterizations for large classes of random 
measures, including probability measures, are given. These results are applied to yield new results/identities 
related to the large class of two-parameter Poisson-Dirichlet models. The results also yields easily perhaps the 
most general and certainly quite informative characterizations of extensions of the Markov-Krein correspondence 
exhibited by the linear functionals of Dirichlet processes. This article then defines a natural extension of Doksum's 
Neutral to the Right priors (NTR) to a spatial setting. NTR models are practically synonymous with exponential 
functions of subordinators and arise in Bayesian non-parametric survival models. It is shown that manipulation of 
the exponential formulae makes what has been otherwise formidable analysis transparent. Additional interesting 
results related to the Dirichlet process and other measures are developed. Based on practical considerations, 
computational procedures which are extensions of the Chinese restaurant process are also developed. 
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1 Introduction 

This paper discusses the active usage of a Poisson process partition based Fubini calculus to solve a variety 
of problems. That is, a method which will be used to solve problems associated with large classes of random 
partitions p := {Ci, . . . , C„( p )} of the integers {1, . . . , n}. The method is based on the formal statement 
of two results, which are known in various levels of generality, concerning a Laplace functional change of 
measure and a partition based Fubini representation. In terms of technique, this is an amplification of the 
methods discussed in Lo and Weng (1989) [see also Lo (1984)] for a class of Bayesian nonparametric weighted 
gamma process mixture models. The idea to choose a Poisson process framework was based on suggestions 
from Jim Pitman. The utility of the approach is demonstrated by its application to a suite of problems 
which arc within the general size-biased framework of Pitman, Perman and Yor (1992, Section 4), with 
now a spatial inhomogeneous component. Methodologically this article may be viewed as a treatment of 
combinatorial stochastic processes from a Bayesian (infinite-dimensional calculus) technical viewpoint. 

A key role in the works of Lo (1984) and Lo and Weng (1989), is played by a partition distribution on the 
integers {1, . . . , n} which is a variant of Ewens sampling formula [see Ewens (1972) and Antoniak (1974)] as- 
sociated with the Poisson-Dirichlet partition distribution. One particular feature is that posterior quantities 
written with respect to a Blackwell and MacQueen (1973) urn scheme can be further simplified to calcula- 
tions which amount to sums over partitions p of {1, . . . , n}. This makes p what I term a separating class. 
The methodology discussed there amounts to a partition based Fubini calculus for Dirichlet [see Ferguson 
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(1973, 1974), Freedman (1963) and Fabius (1963)] and gamma processes. Pitman (1996), extends the de- 
scription of the Blackwell-MacQueen sampling from the Dirichlet process to a large class of species sampling 
random measures. Importantly, he develops ideas surrounding the two-parameter Poisson-Dirichlet family of 
distributions in a Bayesian context. This provides a nice bridge to related work, where the two-parameter 
family appears, in for instance Pitman (1995a, b, 1997a, 1999), Pitman, Perman and Yor (1992), Pitman and 
Yor (1992, 1997, 2001). Those works are non-Bayesian and center around topics such as Brownian excursion 
theory and Kingman's theory of partition structures as developed in Aldous (1985) and Pitman (1995a). 
Returning to a Bayesian setting, Ishwaran and James (2001a) recently develop the calculus for a class of 
species sampling mixture models analogous to and extending the model of Lo (1984), based on the work of 
Pitman (1995a, 1996). 

The interest here is to extend these ideas to other random measures, not necessarily mixture models, 
via more general partition structures. The question of how indeed to obtain information via as yet possibly 
unknown partition structures related to classes of random measures suggests that one may need considerable 
expertise in combinatorial calculations. Here, this issue is circumvented by usage of what is referred to as a 
Poisson Process Partition Calculus. Note that Poisson Palm calculus is employed in Pitman, Perman and 
Yor (1992) and Pitman and Yor (1992). [See also Fitzsimmons, Pitman and Yor (1992)]. The intersection 
with those results manifests itself in section 5. 

One may infer from Lo (1984) and Lo and Weng (1989), that Bayesian infinite-dimensional calculus is a 
calculus based on the disintegration of joint product structures on abstract spaces which exploits properties of 
partitions p or some other separating class. Examples of other separating classes, which will not be discussed, 
are the s-path models found in Brunner and Lo (1989) and the classification based methods for generalised 
Dirichlet stick-breaking models discussed in Ishwaran and James (2001b). Specifically these ideas are applied 
to classes of boundedly finite random measures, say /i , on complete and separable spaces [see Daley and Vere- 
Jones (1986)] which are linked to Poisson random measures. I will mention early that this is not synonymous 
with the notion of stochastic integration which suggests simply a marginalization over the infinite-dimensional 
component; although these types of ideas will play a role. Here the primary interest is in the derivation and 
various characterizations of the joint strucuture in terms of an infinite-dimensional posterior law of /i and 
its marginal components. A key aspect of disintegration of measures on Polish spaces is the availabilty of 
a well defined Fubini's theorem. The notion of Bayesian models and disintegrations is made quite clear in 
Le Cam (1986, Chapter 12). The term Bayesian is meant primarily in terms of technique. The treatment 
of problems is more in line with a broader point of view such as in Kingman (1975) rather than Ferguson 
(1973). Of course I shall cover quite thoroughly models which arise in Bayesian nonparametrics. Readily 
accessible general disussions on disintegrations of measures may be found in Pollard (2001) and Kallenberg 
(1997). See also Blackwcll and Maitra (1984) , Dellacherie and Meyer (1978), and Pachl (1978). Additionally, 
Daley and Vere- Jones (1988), Matthes, Kerstan and Mecke (1978) and Kallenberg (1986) provide details 
about Fubini's theorem for random measures cast within the language of Palm calculus. 

1.1 Basic principles and motivation 

For motivation the typical but rich mixture model setup is described. Let K denote a non-negative integrable 
kernel on a complete and separable space X x y and let [i denote a boundedly finite discrete random measure 
on y. A mixture model is defined as follows 
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for X varying in X . When /i is a Dirichlet process, a two-parameter Poisson-Dirichlet process or a weighted 
gamma process then we are in the setting described earlier . When K is chosen to be a density such as a normal 
kernel and \i is a probability measure then / is a random density. This presents one way to describe random 
measures over spaces of densities and is analogous to the idea behind classical density estimation where one 
convolves an empirical distribution function with a kernel. A similar construct holds for random hazard rates 
which might be useful in models in a multiplicative intensity setting. However, the fact that the kernel may be 
specified rather arbitrarily leads to the description of a large body of models which appear in nonparametric 
statistics, spatial modelling and general inverse problems. Letting K(Aij,Y) := I{Y 6 Aij} corresponds 
to partially observed models such as interval censoring, right censoring, double censoring described, from 
a fequentist viewpoint, in Turnbull (1976) and Groencboom and Wellner (1992). See Lindsay (1995) and 
Groeneboom (1996) for much more general models. As discussed in Lo (1984) and Lo and Weng(1989) 
one can induce specific shapes such as the class of monotone densities or hazards via a uniform kernel or 
completely monotone models via mixtures of exponential kernels. In statistical terms, when /x is a probability 
measure P, the general mixture model has an interpretation where X\Y,P is K(X\Y) and Y\P is missing 
information with distribution P. In spatial statistics, Wolpert and Ickstadt (1998b) propose to specify /i as 
a Levy random field where (Q) may represent the intensity of a Poisson process. Brix (1999) proposes a 
class of generalized gamma shot-noise processes which is a flexible class of such models. Such models are 
used rather than a raw Levy-Cox process to introduce spatial dependence. The term Levy-moving average 
process has been used in analogy to Gaussian moving averages where an example is a model of the form 
/ K (x — y)n(dy) or more specifically 



which is reminiscent of a stationary Ornstein-Uhlenbeck process. Barndorff-Nielsen and Shepard (2001) 
propose the usage of non-Gaussian Ornstein-Uhlenbeck processes which also can be viewed as a mixture 
representation where \x is a Levy process. See also Le Cam (1961) for an early mathematical discussion 
of random measures and shot-noise models. While such models are indeed rich in terms of flexibility and 
diversity in terms of applications, many questions remain open about their properties. Suppose that now 
one has the joint product measure of {X\, . . . , X n , /j,} 



which arises in a variety of contexts. The quantity above represents one possible disintegration of the joint 
strucuture {X±, . . . , X n , //}. Most often direct evaluation of {X±, . . . , X n , //} is not simple or practically 
implementable. Stripping away the kernel K one is left with the joint product structure, {Y%, . . . , Y n ; /x} 
which due to versatility of an available Fubini's theorem becomes the main object of interest. That is 
knowledge of this structure reduces the problem of the mixture model above to a special case of a cadre of 
possibilities. Hence the goal is to find the following disintegration 



(2) 




(3) 




n 



(4) 



P(dn)Y[n{dYi) := Vidfi^M^dYi, dY n ) 



where is a possibly sigma-finite joint moment measure of Y and V(dfi[Y) can be thought of as the 
posterior distribution of /x|Y. However, the result is best understood and its utility is revealed by an equivalent 



James 



5 



statement via Fubini's theorem, 

(5) f g(Y, f x)f[f,(dY i }V(dfi) := f g(Y, ^)V{d^\Y)M^dY u . . . , dY n ) 

J i=l J 

for g an integrable or positive function. As is certainly known it is sufficient to check for g specified to be 
an indicator of appropriate cylinder sets or other characterizing function. The structure M M (dYi, . . . ,dY n ) 
is an urn-type structure that can be generated sequentially via conditional moment measures. If [i is a 
probability measure the notion of conditional moment measures is synonymous with the notion of Bayesian 
prediction rules. However the exchangeable urn structure becomes a bit un- wieldly and one seeks a further 
disintegration of Y. A natural one is based on the often quite informative decomposition Y := (Y* , p) where 
Y* = {Yi, . . . , Y T i(p)} are the unique random values given a random partition p of the integers {1, . . . , n}. 
In other words the joint (sigma-finite) measure admits a disintegration in terms of a conditional measure on 
its unique values and a measure on p. Neither of which need be a proper probability measure. 

The main point boils down to the following basic principles; suppose that a random measure ji* is some 
function of \x , i.e. fi* = g(n). Then via (|5|) its posterior law, marginal law and partition strucuture can all be 
derived from those corresponding aspects of fi. This is of course provided that one has explicit information 
about fi. The utility of such a procedure for [i is then amplified by its richness. That is, a measure of how 
many interesting processes fi* can it capture. The interesting aspect of this is that the measure of richness of 
\i must correspond to the simplicity of its posterior laws, marginal moment and partition strucutures, while 
still being informative. The structures must indeed act in a way like canonical basis functions. In other words 
complex structures can be derived from simple ones. The Poisson random measure, N, emerges as a natural 
candidate given its prevalence in various theories of random measures and its basic connections (via the 
Poisson random variable and Bell's number) to random partitions of the integers. Albeit there is a duality 
to an approach using combinatorial arguments, an exploitation of the Poisson random measure analogue of 
, with a further partition disintegration, allows one to proceed in a pure framework of disintegration of 
measures to directly derive many aspects of large classes of measures /x. 

The notation ^ p will be used to denote the sum over all partitions of the integers {l,...,n}. As is 
well known[see Rota (1964)], this sum is equivalent to Bell's number. For papers which discuss the natural 
relationships of the Poisson process/random variable to partitions, see for instance Constantine and Savits 
(1994), Pitman (1997b), Constantine (1999) and Di Nardo and Senato (2001). The papers by Constantine and 
Savits (1994) and Constantine (1999), and references therein, are certainly related to this one. Constantine 
and Savits (1994) discuss methods to evaluate identity/moment formulae for compound Poisson processes 
via Faa Di Bruno's formula. Certainly one can infer from Theorem 2.1 of Constantine and Savits (1994) that 
many of the formulae here, expressed in terms of ^2 p , can be re-expressed in terns of infinite-sum notation 
related to Dobinski's formula or more obviously cycle notation. 

1.2 Notation and preliminaries 

Again let p = {C±, . . . , C n ( p )} denote a partition of size n(p) of the integers {1, . . . , n}, let e^ n denote the 
cardinality of each cell Cj for j = 1, . . . , n(p). This partition structure is related to a description of general 
analogues of a Chinese restaurant scheme to generate partitions described in terms of a sequential seating of 
customersfsee Aldous (1985), Pitman (1996) and Kerov (1998)]. The results will be closely connected to such 
a structure generated from the exchangeable partition probability function (EPPF) (partition distribution). 
See Pitman (1995a, b, 1996) for a thorough description of the EPPF concept. Additionally, for r > 1, let 
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p r = {Ci. r , . . . , C n t Pr y r } denote a partition of {1, 2 ... , r}, where Ci ir denotes the current configuration of 
table i after r customers have been seated and ej iT . denotes the number of customers seated at Cj iT . . The 
partition p r +i then denotes the (updated) one step larger partition on {1, 2, . . . , r + 1}. 

Now specific notation is given for models which shall be looked at in some detail. That is, for the two- 
parameter Poisson-Dirichlet models and closely connected generalised gamma family of random measures. 
In addition notation is given for a general spatial variation of the Beta process of Hjort (1990). First, we 
briefly describe the two-parameter Poisson-Dirichlet class of models. See Pitman and Yor (1997) and Pitman 
(1996) for more details. Let (ZA denote a collection of iid random variables whose distribution is a diffuse 
probability measure H and independently of (Zi), let (Pi) denote a collection of ranked probabilities which 
sum to one and have a two-parameter Poisson-Dirichlet distribution denoted as PD(ce, 9) with parameter 
values < a < 1 and 9 > —a. The corresponding random probability measure has a representation, 

oo 

(6) Pa, 9 (-):=£ P *M-)- 

i=l 

The law of P a ,e, denoted VD a fi(dP\H) is uniquely associated with its prediction rule and exchangeable 
partition probability function (EPPF) given as, 



(7) PPW €.K M- + £ kfSn *?<•>• 



and, 

(8) PD(p\a,9) 



n"=i(*+i) 

The extreme cases are the normalised stable law process P a fi and the Dirichlet process, Po,e with shape 
parameter 9H . The laws of the (Pi) and Ewens sampling EPPF formula for the Dirichlet process arc denoted 
as PD(9) and PD(p\9). Similarly for the stable process write PD(a) and PD(p\a). Now for b > the rich 
family of generalized gamma random measures [see Brix (1999)] is generated by the Levy measure 

(9) Pa ,b(ds) = * r s-^e-^ds, 

r(l-a) 

which includes the stable law subordinator, 6 = 0, gamma processs subordinator a = 0, and the inverse- 
Gaussian law, a — .5, b > 0, among others. The notation p a := p a , n will be reserved for the stable law and 
the choice 0po,i will be used to generate the Dirichlet process family of models. The general subordinator 
has increments which have a distribution belonging to an exponential family of distributions with a power 
variance function, introduced by Tweedie (1984) and further discussed in Hougaard (1986), Bar-Lev and Enis 
(1996), and Jorgensen (1997). See Kiichler and Sorensen (1997). Classes of compound Poisson process models 
based on this distribution are discussed in Aalen (1992), Lee and Whitmore (1993) and Hougaard, Lee and 
Whitmore (1997). Lastly, a spatial version of Hjort's (1990) (two-parameter) Beta process corresponds to 
the Levy process generated by the inhomogeneous Levy measure, 

(10) u- 1 (l-u) c ^- 1 duA (ds,dx) 

for (u, s, x) e (0, 1] x (0, oo) x X. The quantity c(s) is a decreasing function on (0, oo) and Aq is a hazard 
measure. The symbols Q(a, b), [3(a, b) will be used to denote gamma and beta random variables respectively. 
Q(dx\a, b) will denote a gamma density. 
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Some other references connected to the Poisson-Dirichlet family, not mentioned later, include McCloskey (1965), 
Engen (1978), Carlton (1999), Donnelly and Tavare (1987), Gyllenberg and Koski (2001). See the article by 
Ewens and Tavare (1997) for a discussion of the wide applicability of the two-parameter model. See also 
Pitman (1995b), which will be referenced later. 



2 Poisson Process Partition Calculus 

Let N denote an inhomogeneous Poisson process (measure) on a complete and separable space X with 
(diffuse) mean measure v(-). That is, the Laplace functional of N is of the form 

C N {f\v) = J expj- Jj{x)N{dx)\v{dN\v) 



exp (- J (l-e- f( -^)u(dx)\ 



for non-negative functions / £ BM(X) on (X,B(X)) where BM(X) denotes the collection of measurable 
functions of bounded support on X. See Daley and Vere- Jones (1988) for a description of these concepts. 
For brevity we use the shorthand notation of the type, 

e - N (f) = exp(- / f(x)N(dx) 



I J x 

The exposition of this paper centers around the utilization of disintegration results related to the joint 



measure 



(11) V(dN\v)UN{dXi), 

i=l 

where ( |TT| ) represents a disintegration of the joint product measure of {Xi, . . . , X n , N}. Moreover, the col- 
lection X={Xi, . . . , X n } can be considered as conditionally independent given N. However importantly once 
integration is done over N the collection X will usually consist of tied values. It follows that one can always 
represent X = (X*, p) where X* = {X^, . . . , X*, s} denotes the unique values and p dictates which variables 
are equal according to the relationship Xi = X* if and only if i S Cj . The main purpose of this section is to 
describe two results concerning the Poisson process which are fashioned as tools to be tailor-made to solve a 
variety of problems in an expeditious manner. 

2.1 Basic tools 

First an (exponential) change of measure or disintegration formulae based on Laplace functionals is given 
below. Such an operation is commonly called exponential tilting. 

Lemma 2.1 For non-negative functions f € BAI(X) on (X , B(X)) and g on (Ai, B(A4)) 

g(N)e- Ni -^V{dN\v) = C N {f\v) [ g(N)V(dN\e- f u), 

M JM 

where V{dN\e~f v) is the law of a Poisson Process with intensity 

e" f{x) u{dx). 

In other words the following absolute continuity result holds, 

(12) e- N ^V{dN\v) = C N {f\v)V{dN\e-^). 
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PROOF. By the unicity of of Laplace functional for random measures on X it suffices to check this 
result for the case g(N) — e~ N ( h \ Thus it follows that, 



e- NU+h) V{dN\v) = C N {f\v) / e- Ni - h) V s {dN) 

M JM 



where for the time being Vf denotes some law on N . Simple algebra shows that 

e-^V f (dN) = £A L (/ + y 

M L N\J\V) 

and hence Vf(dN) :— V(dN\e~l v) which concludes the result. ■ 

REMARK 1. Lemma 2.1 is a simple functional extension, mod the Gaussian and drift component, of an 
analogous result for Levy processes on 1Z or more generally 7Z which may be found in Kuchler and Sorenson 
(1997) Proposition 2.1.3. The utility of Lemma 2.1 will be demonstrated throughout. Poisson processes with 
laws described by V{dN\e~* v) can be found in Pitman and Yor (1992) [See Section 5 of this manuscript]. 



REMARK 2. Naturally Lemma 2.1 extends to the following somewhat more vague generalisation; Sup- 
pose that fj, is a random measure with Laplace functional then given the setup in Lemma 2.1 with // in 
place of N, e~^f'V(dfj.) — C^{f)V 'f(dfi) where Vf is characterized by its Laplace functional 



CM + h) 



im An/) 
One can replace the argument with characteristic functionals. 



Results which identify the disintegration of ( |ll| ) in terms of the posterior distribution of the Poisson 
process and the marginal joint measure 



(13) 



M(dX 1 ,...,dX n ) = f 

JM 



n 



V(dN\v) 



are well known in the literature via Palm calculus for Poisson processes. The quantity M in ( |l3|) is also known 
as the joint moment measure. These existing results are customized in Lemma 2.2 below where emphasis is 
placed on the partition structure. 



Lemma 2.2 Let g be a non-negative or integrable function on X n x j\4, then for each n > 1 

»(p) 



(14)/ g{^N)T\N{dx i )V{dN\v)=Y^ / g(X*, p, N + V 5 x *)V(dN\v) 



3=1 



rt(p) 

n »( dx v> 

3=1 



with 
(15) 



/ g(X*,j>,N+f]5 x; )V(dN\v)= f g(X\ p, N)V(dN\v, X). 

JM - =1 JM 

The moment measure M is also expressible via the conditional moment measures as, 

n 

(16) M(dX) = v{dX x ) Y[ 



n(p«-i) 

u(dX l )+ S x;(dXi 
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REMARK 3. It is of course true that Lemma 2.2 is not entirely novel. However, the partition repre- 
sentation that is used is certainly not readily seen in the literature. Moreover, it has been tailor made to 
assume its present purpose as a general tool. One way to deduce the partition representation is to examine 
carefully Daley and Vere-Jones (1988), [equation (5.517), Lemma 5. 2. VI, and the discussion on page 192]. A 
simple minded but informative approach is to simply refer back to the case of Poisson random variables. For 
clarity and also to showcase what is believed to be interesting side results involving partly Lemma 2.1 we 
prove this result in its entirety in the next section using alternate means. 

2.2 Supporting results 

Note that Lemma 2.1 implies the following result for each bounded set B, 

(17) f N(B)exp{- [ f{x)N{dx)\v{dN\v) = C N {f\v) ( e~ f{x] v{dx). 

JM I Jx J JB 

This is reminiscent of the expression which appears in Lemma 10.6 in Kallenberg (1986) and perhaps 
more clearly in Proposition 12.1.V in Daley and Vere-Jones (1998). That is, the expression ( fl7j ) identifies 
the conditional Laplace functional of the Poisson process given one observation [see Daley and Vere-Jones 
(1988), p. 458]. A point to note is that in contrast to Daley and Vere-Jones (1988) Proposition 12. IV, this 
result is not obtained by taking derivatives. This suggests that Lemma 2.1 can be used repeatedly to obtain 
the conditional Laplace functional given n observations. Thus providing an alternative to an argument using 
repeatedly say Lemma 12.1V. The general dual of Lemma 12.1V. can be deduced from Remark 2 as follows; 
Suppose that /i is an random measure(as in Remark 2) with finite 1st moment measure, say m^, then 

(18) f fx(B)e-^-p(d») = C^f) [ f ^(dx)Pf(dn) := C„(f) f r(f\x)m»(dx) 

JM J B JM J B 

for some function r determined by the second expression. That is, the evaluation of J B J M /j,(dx)Pf(d^). 
Hence the conditional Laplace functional of fj,\x is 

£ M (/|aO := C^f)r{f\x). 

This general form can be applied repeatedly to (conditional) random measures fx\x\, ...,Xi etc, where the 
requirement is the existence of a finite conditional measure m^(-\xi, . . . , Xi). All such results can be deduced 
from an argument similar to what is used in Proposition 2.1 below. 

REMARK 4. A result for general n is quite applicable. As an example consider finite random mixtures 
of infinitely divisible random variables. That is, 

m 

(19) Y,W k , m 6 Zk 

k=l 

where Wk, m are iid infinitely divisible random variables and Zk are iid random variables. Such models can 
be used as approximations to many of the models discussed here. The emphasis on the change of measure 
interpretation should also prove useful. In fact, for such an infinitely divisible class the results in Section 3 
apply with small modification. 
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REMARK 5. James (2001a, b) using the analogue of Lemma 2.1 for weighted gamma and generalised 
weighted gamma process obtained their posterior characterizations in this manner without any specific 
mention of Poisson proceses. The idea for this approach is based on an extension of the arguments in Lo and 
Weng (1989). In Section 3 it is shown that Lemma 2.1 actually implies these analogues. 



Proposition 2.1 Lemma 2.1 implies that the conditional Laplace functional of N\X\, . . . ,X n based on the 
model is, 

n(p) 

(20) c N (fW) n e ~ /(x;) 

3=1 



PROOF. The result proceeds by induction. Let s,f e BM(X) and choose g(N) = J x s(v)N(dv). The 
case for n = 1 follows from (p7|). For general n = r it follows from Lemma 2.1 that the conditional Laplace 
functional of N given (X r , X r+ i) is determined by the expression 



£iv(j» 



n(p r ) 
3=1 



„ "(Pr) 

/ s(x r+1 )e- f ^+^u(dX r+1 ) + s ( X j) 

Jx j=l 



Now define a function t{X r+ \) to be e /(-^h-i) if X r+ i is not equal to {XI, . . . , X*^ p ^} and is set to be one 
otherwise. Then, 



s(x r+ i)t(X r+1 ) 



x 



n(p r ) 

v[dX r+1 ) + J2 5x*(dX r+1 ) 



- "(Pr) 

/ s{x r+l )e-^ x ^u{dX r+l ) + J2 S ( X V- 

3 = 1 



Hence, the conditional Laplace functional is, 



(21) 

as desired. 



^N(fW) 



»(Pr) 

3 = 1 



t{X r+1 ) = C N {f\v) 



"(Pr+l) 

n e- f(x;) 

3 = 1 



REMARK 6. The proof of Proposition 2.2 below follows closely an unpublished proof by Albert Y. Lo 
for the case of gamma processes. That is, it is an alternate proof for Lemma 2 in Lo (1984) which yields 
the appropriate partition representation for integrals with respect to a Blackwell-MacQueen urn distribution 
derived from a Dirichlct process. The style of proof exploits properties of partitions similar to those stated 
in Pitman (1995a, Proposition 10). In particular see ( p4| ) below. See also the proof of Lemma 5 in Hansen 
and Pitman (2001) for general species sampling models. Details in the proof of Proposition 2.2 translate into 
generalizations of a weighted Chinese restaurant algorithm given in the next section. Proposition 2.1 and 2.2 
combine to yield Lemma 2.2. 



Proposition 2.2 For i = 1,. .. ,n, let gi be non-negative functions in BM(X) then, 

n(p) 



(22) 



M 



[] f giixJNidxi) V{dN\v) = J2 II / 



P .7 = 1 



n * 

ieCj 



v(dx*). 



Equivalently, M(dX) = W=i v{dX*) 
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PROOF. The proof of ( p^ ) proceeds by induction. Case n = 1 is obvious, Now suppose it is true for 
n = r. Let p r +i denote a partition of {1, . . . , r + 1} , and define for each r > 0, 



>{p r ) 



(23) 4> g {Pr) = J! jf 

It follows that 4> g (p r +i) is 



n *(^) 



v{dx*j) 



'g[Pr) I g r +l{v)v{dv) 



if n(p,. + i) = n(p r ) + 1, otherwise if the index r+1 is in an existing cell/table Ci_ r then it is equivalent to 



where 



^g(Pr) / g r +i(v)n g (dv\Ci ir ) 

Jy 



■K g (dv\C hr ) = 

for i = 1, . . . , n(p r ). Note that this implies that, 







v(dv) 


J* 




v{dv) 



(24) 



MPr+l) = ^0 9 (Pr) 



»(Pr) 



.V 



gy + i(u)i;(efo) + V] / fl , r+i(«)%(^'w|Ci ir ) 



Pr+l P 

Now by (simple algebra) and the induction hypothesis on r it follows that 

n(p r ) 



Pr + l 



„ „ n(Pr) 
Z] <£ 9 (Pr+l) = / / Sr+l(w)^(du) + V ffr+l(^ 



Now utilizing the fact that, M(dX r+ i) = i/(tOf r+ i) + X)f=i &X*{dX r+ x) M(dX r ), concludes the proof. 



£[ffi(*i) 



M(dXr). 



REMARK 7. Of course (22) in its most basic form leads to well-known results for moments and cumu- 
lants of a Poisson random variable. For instance, setting <?; to be indicators of a bounded set A yields, 

E(N(A) n ) = ^V(A)" (p) . 
p 

Where N(A) is a Poisson random variable with mean measure v(A). 
2.3 Chinese restaurant like approximation methods 

In this section a new algorithm for approximating complex integrals and in fact posterior distributions is de- 
scribed. This algorithm works by sequentially sampling from a partition distribution and structurally behaves 
similar to the Chinese restaurant process seating algorithm discussed in Aldous (1985), Pitman (1996) and 
Kerov (1998). In particular, the proposed scheme is influenced by the weighted Chinese restaurant (WCR) 
procedure developed in Lo, Brunner and Chan (1996) [see also Brunner, Chan, James and Lo (2001)] for 
mixtures of Dirichlet process and weighted gamma process posterior models. Ishwaran and James (2001a) 
subsequently generalise the WCR to include the class of species sampling mixture models. The essence of all 
these algorithms will be revisted in this section. 
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Notice that the left hand side of (22), 



(25) 



n(p) 

En 

P 2=1 



X 



u(dx*A, 



is a function of partitions of the form, ^2 p t(p). This result is analogous to Lo (1984) where he points out 
that complex multiple integrals with respect to a Blackwell-MacQueen urn are equivalent to considerably 
more manageable sums over partitions. However, it is known that the compexity of the number of partitions 
behaves like Bell's number as n increases and hence one needs some method to approximate such quantities. 
In order to further illustrate a connection to a Chinese restaurant wc introduce the following expressions; 



(26) 

and 
(27) 



™(p) ™(p) „ 

£0 n(p) n r (e;-a)n / 

p 2=1 3=1 Jy 



n * w) 



n(p) 

^PZ?(p|a,0)n 

P 2=1 



ieCj 



H(dY*). 



The expressions above may arise respectively from a generalized gamma mixture model and a two-parameter 
Poisson-Dirichlet process mixture model. The first expresssion ( p6| ) appears in James (2001b) and reduces 
to an expression for the gamma process in Lo and Weng (1989) and James (2001a). The latter expression 
appears in Ishwaran and James (2001a) and extends the analogous result for the the Dirichlet process in 
Lo (1984). That is, the latter corresponds to the marginal likelihood of X\Y when Y\, . . . , Y n \P are iid P, 
the law of P is V a ,e(dP\H). Allowing for more flexibility in the interpretation of gi and v these terms can 
be written as special cases of (^1|). Consequently, an understanding of the mechanism behind the WCR 
algorithm as outlined in Brunner, Chan and Lo (1996) translates into a general algorithm which is now 
described. From (63) set 



(28) 



l(r) = 



„ n(pr) , 

/ g r +\(v)y(dx) + S_. I 9r+i(x)wg(dx\C itr ) 
Jx Jx 



The procedure relies on a method to generate partitions p based on the following rule, described in terms of 
customers entering a restaurant, 

Algorithm 1 Step 1: Seat the first customer to a table with probability 1(0)/ 1(0) = 1. 
Step (r + 1 ): Given p r , customer r + 1 sits at table Cj_ r with probability 

F(p r+ i|p r ) = l^y 1 / g r+ i(x)n g (dx\Cj >r ), 
Jx 

where p r +i = p r U {r + 1 £ C'i ir } for i — 1, . . . , n(p r ). Otherwise, customer r + 1 sits at a new table 
with probability 

P(p r+ i|p r .) = Z(r)" 1 / g r+1 (x)v(dx). 
Jx 

The completion of Step n produces op = {Ci, . . . , Cw p )} = Pn, where now p is drawn from a density 
<?(p|g) whose form is described in the Lemma 2.3. 
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Lemma 2.3 The n-step seating algorithm results in a partition p drawn from a density /distribution given 
by <z(p|g) that satisfies, 

n(p) 

i(pig)?(pig) = n 

3=1 



.V 



n aw) 



!/(dx;), 



w/iere X(p|g) = n"=i - 1) 



PROOF. As in the proof of Proposition 2.2, define g (p r ) from (p3[). Now note carefully that, 

^(p r ) 7 r s (rfx;ic, > )=i/(^;) n 9i( x D x ii[ n ^m**)- 



Hence it follows that if p r +i = Pr U {r + 1 G Cj.r}) then 



s(Pr+l) „„., td/„ l„ ^ _ ^(Pr + l) 



(29) / g r+ i(u)-Kg(du\Cj^ r ) — — r— and P(p r+ i|p r ) 



(Pr) l(r)<j) g (p r ) 



A similar argument for p r +i forming a new table shows that (|29|) holds in general. Now notice that since 
Pr+i contains all the information in p r , the product rule of probability gives 

q( Pn \g) = P( Pl ) II P(p r+ i| P r) = 

rJi T (P|g) 

where IP(pi) = ^ 9 (pi)//(0) = 1. Now setting p — p n yields the desired result. ■ 



Now to approximate terms such as (|5j), draw an iid sample, say ■ ■ ■ , p^ B \ of size B from e?(p|g) and 
use B~ x X)b=i 1{p^ g)- The Chinese restaurant algorithm to generate a draw from PD(p\a, 9) is recovered 
by setting l(r) = + r, J x g r+ i{x)v(dx) :— 6 + n(p r )a, and J x g r +i(x)ir g (dx\Ci tr ) :=ei ir — a. In other words 
under this specification g(p|g) = PD(p\a,8). 

REMARK 8. The algorithm above is an example of a sequential importance sampling procedure. The 
efficient Dirichlet process algorithm discussed in MachEachern, Clyde and Liu (1999) can be seen as a special 
case of the WCR when using a binomial kernel. The Chinese restaurant process structure however is not 
emphasized in that work. There are also analogous MCMC methods which can now readily be deduced from 
the descriptions given in Brunner, Chan, and Lo (1996) or Ishwaran and James (2001a). See Ishwaran and 
James (2001b) and Ishwaran, James and Sun (2001) for applications and ideas for other algorithms. 



3 Size-biased generalizations of completely random measures 

In this section it is shown how specific applications of Lemma 2.1 and 2.2 yield explicit disintegration results 
for a class of random measures which includes completely random measures. One feature of the analysis 
reveals that cumulants assume in many respects the role played by the EPPF in posterior calculus for 
random probability measures. The present construction is influenced by section 4 of Pitman, Perman and 
Yor (1992). The results will be applied throughout. 
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Let N denote a Poisson process on an arbitrary Polish space X = S x y with intensity v(ds,dy) = 
p(ds\y)r)(dy) for p a Levy measure on the Polish space S depending on y in a fairly arbitrary way and r) a 
sigma-finite (non-atomic) measure on y. Denote the law of N as V{dN \p, rf). As in Pitman, Perman and Yor 
(1992, section 4) let h denote an arbitrary strictly positive function on S. Furthermore it is assumed that 
h, p, r\ are selected such that for each bounded set B in y, 



(30) / / min (h(s), l)p(ds\y)r)(dy) < oo. 

J b Js 

Now define a random measure p on y such that it may be represented in a distributional sense as, 

(31) p{dy) = f h(s)N(ds,dy). 

Js 

The law of p is denoted as V(dp\p,rf). When p does not depend on y, then write p(ds\y) := p(ds). Similar 
to Tsilevich, Vershik and Yor (2001) the term homogeneous will sometimes be applied to this special case of 
p and p. In the case that S = (0, oo) and h(s) := s then following Kingman (1967, 1993) p is a completely 
random measure without a deterministic component. 

REMARK 9. Related to completely random measures, Ferguson and Klass (1972) discuss constructions 
for the class of Levy processes on (0, oo) without a Gaussian component but allowing for fixed points of 
discontinuity. See also Wolpert and Ickstadt (1998a, b), Brix (1999) for recent applications of completely 
random measures to spatial statistics. The condition ( [30] ) gaurantees that p in (^Tj) is a boundedly finite 
measure in the language of Daley and Vere- Jones (1988, Definition 6.1. 1.). Theorem 6. 3. VIII of that work 
discusses the representation of completely random measures. Kallenberg (1997, chapter 10) describes con- 
ditions under which Poisson functionals are finite. Certain aspects of the presentation below are of course 
implicit in Kallenberg (1986). 

The Laplace functional for p can be represented as follows 

(32) £„(<?) = cxp (- j j(\ - e-^y^pidslyHdy^j . 

As in Pitman, Perman and Yor (1992) the notation T :— p{y) will be used to denote an almost surely 
finite total mass. 

3.1 Disintegrations and posterior distributions 

Define, the following moments with respect to the measure p(s\y) as for each fixed n and y, 

K n {p\y) = / h(s)"p(ds\y) a,nd n n (p) = / h(s) n p(ds). 
Js Js 

Note that, m^(dv\p 1 rj) = [J s h(s)p(ds\v)j r)(dv) denotes the first moment measure of p. 

Now similar to the case of Poisson processes let V(dp\p,rj)Y[^ =1 p(dYi) represent a disintegration of 
the joint product measure of {Yx, . . . , Y n , /z}; where further {Yi,...,Y Tt } can be viewed as conditionally 
independent given p. The techniques in Section 2 will now be applied to identify the disintegration which 
describes the posterior distribution of p given {Yi, . . . , Y„}, and the marginal joint measure of {Yi, . . . , Y„} 
and its corresponding disintegration (Y* , p) . A description of the posterior law of p will now be given. The 
result will then be justified in Theorem 3.1. 
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Now for each n, let V{dp\p, 77, Y) denote the conditional law of p corresponding to the random measure, 



(33) 



Li{-) + J2 h (J ] ,n)8Y;(- 

3=1 



where Jj lTl are independent random variables each with (conditional) distribution depending on Y*, 



(34) 



JP(Jj, n e da|/9,K 



" f s h(uT^p(du\Y?) n e] Jp\Y*) 



and chosen independently of p which is V(dp\p, 77). 

The corresponding(conditional) moment measure and Laplace functional for ( j34|) are given by 



(35) 



m 



n(p) 

.(dwIp^.Y) = m M (d»|/j,tj) + E \- h (JjM)\Y*]S Y; (dv) 

3=1 



and 
(36) 



3=1 Jo 

The joint marginal measure of Y is expressible as 

n 

M„{dY\p,n) =m^(dY 1 \p,ri)'jjm ll (dYi\p,ri,YL,...,Y i - 1 ). 



i=2 



Theorem 3.1 Suppose that p is a random measure defined by ( pOfy and assume that K n (p\y) < 00 for each 
fixed y. Let g be a non-negative or integrable function on y n x M, then for each n > 1, 



g(Y, p)H p(dY z )V(dp\p, rj) = / g(Y, p)V(dp\p, 77, Y)M fi (dY\p, rf). 



(37) 

The expressions in are equivalent to, 

r r r i ™ (p) 

(38) V / / g(Y*,p,^P(dp\p, V ,Y) J| K ejn {p\Y*)r](dY* 

p Jyn( P ) IJ M J J=1 

and M M (dY|p,7 7 ) := fl"^ « ej ,„ (p|Y/)»7(dY/) 



PROOF. First by definition, M M is completely determined by 



fl & PS) 



M^dY^T?) 



1=1 



f[li{dYl)V{dn\p,ri) 



for in BM(y). But this is equivalent to 



Mxy n 



I [ / his^NidsudYi] 
1—1 Jo 



P(dN\p, V ). 
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A direct application of Lemma 2.2, or Proposition 2.2, yields all the desired forms of M^. This is seen 
immediately by setting giixi) = h(si)gi(yi) and replacing v(dx) with p(ds\y)r)(dy) . Now it simply remains 
to show that the conditional Laplace functional of p is (|36|). Since the form of the marginal measure M M 
is established, Lemma 2.2 now shows that the conditional Laplace functional is obtained by using the fact 
that p(g) :— J y J s h(s)g(y)N(ds,dy) and replacing f(x*) in Proposition 2.1 with g(y*)h(sj). Hence, the 
conditional Laplace functional of p is equivalent to, 

-i ™(p) 



(39) 



M 



V(dN\p, v ,s,Y* 



n p ( j > 

3=1 



e ds\p,r],Y*) 



3.2 Cumulants and moment representations 

The joint marginal measure M M disintegrates into 



M^(dY\p, v ) = 



n(p) 



n(p) 

n 



which shows that the possibly sigma-fmite measure of Y, disintegrates into an appropriate joint measure of 
(Y*, p). Such structures will play a fundamental role throughout. As a simple application the joint structure 
can be used to obtain expressions for the moments of the corresponding random variable p(B), for B finite, 
as follows, 

n(p) 

(40) ; 

p j=i 

which corresponds to the classical relationship between moments and cumulants. Additionally for integrable 
linear functionals p(fi), one might be interested in calculating the joint moments, 



™(p) „ 

E[p{B) n \p, V ] :=^I] / ^MY/MdY*) 

„ ■ i J B 



M 



HIT / MvMdy^) 



1=1 i=l 



P(dp\p,ii) 



for integers such that without loss of generality n — X)/=i n i- ^ n application of Theorem 3.1 easily yields, 



(41) 



E 



.1=1 



«(p) „ 

Ell / 

P 3=1 Jy 



.1=1 



rj(du) 



where e' n , satisfying ej in :— 53f=i e \~ n i denotes the number of indices associated with // in Cj. Suppose that 



is. 



E[T n \p, rj] < oo, then an important case of 

„ ' (p) 

(42) 

P i=l P 3 = 1 

The consequences of this representation will play a major role in Section 5. 



™(p) ™(p) p 

E[T n \p, v ] := E II ««*.»(«) -EII / K ej Jp\Y*)r,(dY;). 

r> i=~\ n i*=l y 



EXAMPLE [l] . As an important example consider the generalised gamma Levy measure p a .b for 
b > 0. In this case the J^ n are G{ej.n — ct, 6), 



(43) 



n(p) n(p) 

n K«,n(p«,b) ■■= n r ^.» - a)6-( e -- Q ) 



n(p) 

6 -(«-n(p)a) -Q r(e . n _ a)) 
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and hence 
(44) 



*(p) 



E[p(B) n \p a , b , rj\ := b~ n ]T b n ^ a [ v (B)] n{p) [] r(e JlB - a) 

P 3=1 

When a := and 6=1 this expression combined with ( |4C| ) corresponds to the gamma process with 
shape measure r\. In this case, where r/(y) = 8 is finite, it follows that normalising the expression ( |43| ) 
by £ , [T n | / 9 ,i,ry] yields the EPPF, PL>(p|6>) of the Dirichlet process VT> , g (dP\H). Otherwise in the un- 
normalised case one obtains expressions for the generalised gamma random measure in Brix (1999). In the 
weighted version of this model, discussed in James (2001b), set b :— b(Y*), which reduces to expressions for 
the weighted gamma process when a = in Lo and Weng(1989). Note that although dividing by E[T n \p a ,b, fj\ 
yields a proper distribution for p it is not an EPPF except for the case of the Dirichlet process. 



(45) 



EXAMPLE [2]. For the Beta process, with parameters c, A ,the Jj^ n \Y* are B(ej^ n , c(Y*)). Hence, 
r(e A „)r(c(Y/)) ^ _ _ Ki+e^MYf) ._ e j>n 



r( e ,- „ + c(y;)) 



and E[J j<n \Y? 



These types of integrable operations identify joint distributional structures for ( Y* , p) which have product 
form. This fact is summarized in the next result. The result is important for applications of mixture models 
where Y again are missing values and not observables. The result will also play a significant role in Section 
5. 

Corollary 3.1 Let n" =1 Jy (ji(Yi) p(dYi) be an integrable function of V(dp\p,rj), then there exists a condi- 
tional distribution of Y|p such that the unique values {Yj*, . . . ,Y*, p A are independent with distribution 



W(dY?\ P ,r,,g)<x 



n aw 



i£Cj 



^Jp\Y;) V (dY* 



and p has a distribution proportional to YYj!fi Jy IlieC 9i(Yf) K ej „ {p\Yj')rj{dY*) . If the integrability con- 
dition still holds when the (gi) are equal to one, then 



(46) 



W(dY*\p,r,)<xK ejtn {p\Y*)ri{dY?), 



the distribution of p, is proportional to YYj=i K e j „(^)- In the homogeneous case the integrability condi- 
tion holds only if r] is a finite measure. When rj is a finite measure it follows also that the unique values 
*iV are iid r^W) :=#(•)• 

Hereafter, based on and Theorem 3.1, denote a joint law (conditional on p) of {(Jj n ,Y*)} as 



(47) 



JP(d3,dY*\p, V ) := JJP(dJ i , n |p ) Y/)P(dl7|p,» 7 ). 

3=1 



It follows that there exists joint laws of N, J, Y* p denoted as 

(48) V{dN,dJ,dY*\p,r)) := V(dN\p,r],J,Y*)TP(dJ,dY*\p,i]) := V(dN, dJ\p, r), Y*)P(dY* \p, rf) 
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and also a joint law of p, Y given p denoted as 

n(p) 

(49) V(dLi,dY*\p,r,) :=V{MM,V) II V{dY*\p,rj). 

3.3 Updating and moment formulae 

This section presents a series of important exponential based updating (change of measure) formulae which 
will be used throughout. 

Proposition 3.1 (Updating and moment formulae I) Let N denote a Poisson process with law V{dN\p,rf). 
In addition, let f denote a positive function on S x y. Suppose that w(p) is a positive integrable function of 
p, such that it is representable as w(p) — e^ N ^\ Then, 
(i) 

w(p.)V{dp\p,T)) = P(dn\e~ f p,r))E[w((j.)\p,Tj\ := P{dp\e~ f ' p,r})C N {f\p,r]). 

(ii) If K n {e~* p\y) < oo then for each n, 

n 

w(p) J] p(dYi)V(dp\p, n) = V{dLi\e-fp, v , Y)£ N (f\p, v)M^(dY\ e~f p, V ) 

i=l 

(iii) If K n (p\y) < oo holds then, 

w(p)V(dp\p, V , Y) = V(dfi\e- f p, rj, Y)E[w(ll)\ P , rj, Y] 

and 

C N {f\p,r!)M^dY\e~f p,ri) := E[w(p)\p,r],Y]M f j,(dY\p,ri), 



where E[w(p)\p, 7], Y] := C N (f\ P ,rj)l%W ^ , ' *•> ' 11',;./,,, G ds\Y*,p) 

PROOF. As in Lemma 2.1, it suffices to show that the two sides in statement (i) have the same Laplace 
functional. An application of Lemma 2.1, combined with the fact that p is a functional of N implies that 

(50) f e-^w{p)V{dp\p,T])= [ e-^e- N ^V{dN\p,ri) := [ e-^V(dp,\e- f p,r])E[w{p)\p,rj\ 

J M J M J M 

which yields statement (i). The result in statement (ii) is obtained by replacing e - ^ 9 - 1 in ( |39| ) with e~p(9) e - N (.f) 
and applying Lemma 2.1 to the inner integral. ■ 



Proposition 3.2 (Updating and moment formulae II) Suppose that f in Proposition 3.1 is replaced by 
f n = X)"=i fi> f or postive integrable functions fi, this implies thatw(p) '■= Yi7=i w i(t l ) wherewi(p) — e~ N ^*' . 
Now with respect to the the joint model V(dp\p,r})YYi=i w i(t JL )t JL {dYi), the following additional formulae are 
given; (all expressions are assumed to be finite) 
(i) 

n n 

£-N(fn\p,v) ■= £N(fl\p,v)Y[£ N (h\ e ~ h ~ 1 P>' l l) := E \Y[ W i(P>)\P>V]> 

i=2 i=l 



where f { := Y,)=i fi- 
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(ii) The expresssion below are equivalent. 

E[w i {iJ,)\e- fi - 1 p,r),Y i - 1 )]m IM (dY i \e- fi p,ri,Y i _ 1 ), 

S[t« i (/i)|e- A - 1 p,»7,Y i )]m, t (dr i |e- A - 1 p,f ?) Y i _ 1 ). 

(iii) The above statements coupled with Proposition 3.1 imply that the marginal calculation, 



M 



\\ Wi{ii)n{dYi) 



T(dp\p, rj) 



is equivalent to the following formulae, 

C N (f n \p, V )M^dY\e- f -p, V ), 
E[w(p)\p, V ,Y}M^dY\p,r,), 



E[ Wl (n)\p, v}m^(dY 1 1 e~ fl p, rj) JJ ^M^le"^ 1 ^ »7, Y^^m^l e~ ft p, r?, Y,^), 

i=2 
n 

(/*) |p, r?, Yi]m M (dY! \p,r,) J] (//) | e^p, 77, Yi)]ro„ (dY< | e~ f ^ p^Y^). 



i=2 



REMARK 10. 

A notable special case of Proposition 3.1 (i) was established in Lo and Weng (1989) [see also Lo(1982) 
and James (2001a)] for the weighted gamma process. In Lo and Weng (1989, Proposition 3.1), the statement 
proceeds as follows 

Proposition 3.1, Lo and Weng (1989) 1 Let Q v ,p denote the law of a weighted gamma process with 
shape rj and weight /?(•), then for each positive g 

[ g{p)e-^g v , {dp) = Lg n Jf) [ g(p)g vJj ,(dp), 
Jm Jm 

where (3* =(3/{l + 0f). 

This result establishes the absolute continuity of weighted gamma processes and identifies the specific 
densities. In other words the result of Lo and Weng (1989, Proposition 3.1) includes the quasi-invariance 
result for the gamma process recently established independently in Tsilcvich, Vershik and Yor (2001, Theorem 
3.1). The applications considered by Tsilcvich, Vershik and Yor (2001) are vastly different from Lo and Weng 
(1989) and it is not surprising that this result emerges in another context. The development of the exponential 
formulae used here are directly inspired by Lo and Weng (1989, Proposition 3.1). 
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3.4 A simple proof for the almost sure discreteness of size-biased measures 

In this section a Fubini argument is used to establish the almost sure discreteness of p with law V(du\p,rj). 
This will include a simple alternative proof for the class of completely random measures as discussed in 
Kingman (1993, Chapter 10). Kingman' s result is based on a modification of Blackwell's (1973) argument 
for the Dirichlet process. The present technique is based on the approach of Berk and Savage (1979) and Lo 
and Weng (1989) for the Dirichlet process and weighted gamma process respectively. The only requirement 
I will need is that p admits a disintegration, has a 1st moment measure, or is absolutely continuous with 
respect to the law of another measure p* which has one. The latter case of course will yield the result for 
the stable law. Measurability issues vanish on Polish spaces. The idea is to apply a 1-step disintegration of 
p(dx)V(dp\p, rj). For the arguments below it suffices to show that the result holds over all bounded sets B, 
i.e. sets such that rj(B) < oo so without loss of generality we can assume that r\ is a finite measure. 

Proposition 3.3 (Almost sure discreteness of measures) Suppose that p is V(dp\p,rj) such that p has a 1st 
moment measure m /J (-|/9, 77). Otherwise suppose that their exists a measure p* which admits a 1st moment 
measure and satisfies the absolute continuity relationship, 

(51) T(dp\p, V ) :=g(fi*)r(dfi*\p*,r,), 

for some positive integrable function g. Then p is almost surely discrete. That is, 

(52) f p({x:p({x}) = 0})P(dp\p,n):=0 

JM 



PROOF. Using a disintegration argument 
p({x : p{x] = 0})V(dp\p,rj) := 



M 



I{x: p({x}) = 0}T{dp\p,r ] ,x) 



M 



m,j,(dx\p,i]) 



From Theorem 3.1 the law of p taken with respect to the inner term(that is given x) is the same as 

p{-) + h(J)5 x (-) 

where h(J) is strictly postive and J has law oc h(s)p(ds\x) for almost all x. But since p is not negative it 
follows that 

p({x}) + h(J)5 x ({x}) > h(J) > 0. 

for almost all x with respect to m p . Hence the inner term is zero which concludes the result. If again p does 
not admit such a disintegration, apply the result to the random measure p* , with for instance Levy measure 
e~ gh p, or some other operation, and use the absolute continuity of measures. ■ 



REMARK 11. This method also implies the almost sure discretenes of the measures in Section 5. 
Beyond the mild restriction to Polish spaces, I believe this is the most general result of this type. The 
absolute continuity in ( pl| ) coupled with the existence of such p* , for instance via Proposition 3.1, seems to 
exhaust the possibilities. 
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4 Intensity rate mixture models, Levy moving averages and shot- 
noise processes 

In this section analysis for the class of mixture of hazards models as discussed in the introduction, otherwise 
known as Levy-Cox moving average models, is given. Very little is known about the posterior structure of such 
models with a notable exception being the case of mixtures of weighted gamma processes which is discussed 
in various degrees of generality in Dykstra and Laud (1981), Lo (1982), Lo and Weng (1989). Wolpert and 
Ickstadt (1998a) and James (2001a) consider semiparametric extensions of this model. Full partition based 
posterior analysis ina a general multiplicative setting is given in Lo and Weng (1989) and James (2001a). 
One consequence of the absence of a general analysis of this model is the unavailabilty of computational 
procedures which sample from the updated or posterior based models. As mentioned in the introduction 
such models are currently used in Spatial statistical applications and survival analysis. An interesting class 
of models which fits into this framework is the generalised gamma process proposed in Brix (1999). Wolpert 
and Ickstadt (1998b) propose models based on arbitrary Levy processes. The analysis here includes these 
models as well as mixture models based on the general size-biased random measures described in section 3. 
Consider the random hazard or intensity rate, 



where K denotes a known (r, ?7)-integrable kernel on a Polish space X x y and fi is modelled as a random 
measure with law V(dp\p,r]). The representation in J53| ) defines a large class of random measures A which 
are not independent increment processes. 

In this section explicit posterior characteristics of fi and hence A based on the multiplicative intensity 
likelihood, 



Here are observations in a (Polish space) region X and Y{ can be viewed as missing observations 
on y. Y(s) is a non-negative predictable function which for many applications in event history analysis 
denotes the number of observed individuals still at risk just before time s. An important point is that the 
structural form of L(X|/x) remains the same under right censoring and left filtering [see Jacod(1975) and 
Andersen, Borgan, Gill and Keiding (1993)]. If Y(s) — 1 then the model may correspond to the likelihood 
of an inhomogeneous Poisson process with intensity rate X(t\fj,). One purpose of the development of Lemma 
2.1 is to handle the exponential term in (p34|). Thus mimicking the application of Lo and Weng (1989, 
Proposition 3.1). Proposition 3.1 and Theorem 3.1 in this paper readily yield the desired results. Moreover 
the appearance of the exponential term combined with Proposition 3.1 show that analysis of this model only 
requires the weaker condition, 



(53) 




(54) 



L(X\fi)V{dn\p,n) = n/ K(X t \Y)v(dY) e-"V*)V{dn\p,n) 



are derived where, Jk(v) = [J s Y(s)K(s\v)T(ds)] . 



(55) 




The condition (£>5|) will be assumed throughout this section and now for instance admits analysis for the 
stable law p a ,o, via Proposition 3.1. 
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An application of Proposition 3.1 combined with Corollary 3.1 show that the marginal likelihood is, 

n(p) 



(56) I L{X\p)V{dp\p,rj) = C^kM £11 / 
Jm p J=1 Jy 



n 



ieCj 



K e] Je-f«p\Y;) v (dY*). 



4.1 Posterior characterizations 

Explicit posterior characterizations are now given which follows immediately from the Theorem 3.1, Corollary 
3.1 and Proposition 3.1. (Note that it is assumed throughout that all relevant integrals are finite). 



Theorem 4.1 The posterior distribution of ~Y,p\X based on the model ( \54 ) is representable as 



n(dY, dp\X) (xP{dp\e' fK p,r],Y) 



*(p) 



Y[jP(dY*\e- fK p,v,K) 



7r(p|X) 



where 7r(p|X) cx Jljii Jy UieC K ( X i\ Y j) K e 3 , n (^ fK p\Y*)r,{dY*) is a (posterior) distribution o/p|X. 

Similar to Lo and Weng (1989, Theorem 4.2), Theorem 4.1 implies for instance that the posterior expec- 
tation of the intensity, A|Y, is 



(57) 



r ™ (p) Ki4- (e-f K o\Y*) 

E[\(t\p)\Y] = / K (t\v) Kl (e- fK p\v) V (dv) + V K(t\Y*)- X " ' ' ' ' 



«ei,„(e-^p|l?) 



and hence the posterior expection given X is , 



(58) E[X(t\p)\X] = £ I /' K(t\v>i(Pf K \vHdv)+ n '£ I K(t\v f[ +e ^[ e _/j^K (dr (', ) | ,r(p|Xi. 



«e.,, n (e-f K p\v) 



EXAMPLE [Generalised gamma process] . A brief description of the results related to the usage of 
a generalised gamma process with intensity p a ,b(ds)r)(dy) as in Brix (1999) are given. Note in particular 
that the result holds for the stable case 6 = 0. The posterior distribution of p given Y is denoted as 
V{dp\e~' K p a j,i t\, Y). That is, Jj. n given Y* are independent Q(ej. n — a 1 b+ /icO^*)), and p is now a weighted 
generalised gamma process with Laplace function 



exps— / [(b + fK(v)+g(v)) a -(b + f K (v)) a Mdv) 

a Jy 



The joint moment measure of Y can be expressed as, 

'n(p) \ n(p) 



(59) 



J =1 



which generalizes an expression for the weighted gamma process, see Lo and Weng (1989) and James (2001a). 
See James (2001b) for more details related to the generalised gamma model. ■ 
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REMARK 12. Note that from a practical point of view the distribution of J^ n based on V(dp\e~^ K p, rj, Y) 
may not always be easy to simulate. If however the moment condition in Theorem 3.1 holds then one can use 
an alternative characterization of the posterior based on T'{dp\p, rj, Y). In that case one does not marginalize 
over the exponential term but instead works with the measure, 



REMARK 13. James (2001a) gives results for semi-parametric weighted Gamma process mixture mod- 
els under more complex multiplicative intensity structures. That is for cases where the kernel K depends 
on a Euclidean parameter ip and where for instance there may be several independent Poisson processes. 
A careful examination of that work, coupled with with the results given here provides an obvious way to 
obtain the corresponding result for the general processes. A notable wrinkle is that the Laplace functional 
will depend on ip. A discussion of this is omitted for brevity. 

4.2 Simulating the posterior 

Here the algorithm discussed in Section 2.3 is applied to this setting to demonstrate a possible approach 
to approximate posterior quantities. Again MCMC based methods can also be deduced from the algorithm 
below. First set, 

r r Ki-u (e~f K o\v) 

l(r\K)= / K r+1 (X r+1 \v) Kl (e-f« p\v)r](dv) + V / K r+1 (X r+1 \v) 1+e ^ V _ f! \ (dv\Cj, r ), 
Jy f~[ Jy «e,, r (e f*p\v) 

where in particular , Z(0|K) = Jy K r+ i(X r+ i\v)Ki(e~^ K p\v)i](dv). 

One can now use the variant of the WCR described in Section 2.3 with the seating rule: Given p r , 
customer r + 1 sits at table C^ r with probability 

PtPr+llPrO^rlK)- 1 / K r+1 (X r+1 \v) Kl+e ^ {e _ /J ^% (dv\Cj, r ) 

Jy K e 3 A e fK P\ v > 

where p r +i = p r U {r + 1 G Cj r } for i = 1, . . . , n(p r ). Otherwise, customer r + 1 sits at a new table with 
probability 

IP(Pr+i|Pr) ^(rlKr 1 f K(X r+1 \v)Ki(e- fK p\v)r]{dv). 
Jy 

The completion of Step n produces a p = {Ci, . . . , C„( p )} = p n , where now, from Lemma 2.3, p is drawn 
from the density q(p|K) which satisfies, 



™(p) . 

X(p|K) g (p|K) = n / 

1=1 Jy 



n k ( x ^ y d 

ieCj 



3 = 1 

This fact, together with Theorem 4.1, implies that for any integrable function t(p), 

£ p i(p)J(p|KMp|K) 

p 

The expression (|60|) and Theorem 4.1 now suggest a method to approximate the posterior law V (<i/i|X) 



(60) x>k(pix)= ^; IMKMp ; K) 
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1. Using the seating algorithm above, draw B iid random partitions p = {C\, . . . , C„( p )} from g(p|K) 

2. Use the value of p to draw Y* independently from ir(dY*\Cj) for j = 1, . . . , n(p). This yields 

y* = (*?,•••,*;%,))■ 

3. Using the current value of (Y*,p), approximate a draw from the random measure 

n(p) 

(61) M/ K (0 + E^>)%(-)' 

which is distributed as V{dp\e~* K p, rj, Y). 

4. To approximate the posterior law of a functional g(p), run the previous steps B times indepen- 
dently obtaining values p^ with importance weights T(jp^), for b = 1 . . . , B. Approximate the law, 

P(ff0*)e-|x),with 

m £tiJ{g(<i (t) )e-}i(P (t) |K) 

£f =1 T(pW|K) 

If one only needs to approximate moments, or an integration which yields a t(p) in closed form, for 
instance the likelihood, then steps 2 and 3 can be eliminated and one can replace ( |62| ) with 

(63) Ef^(p (b W b) |K) 



EtiWIK) 



REMARK 14. Note that in ( |6l| ) the main difficulty is to approximate a draw from p/ K . Brix (1999) 
discusses methods on how to approximate a generalized gamma process V{dp\p a .\,, rj). It should be straight- 
forward to extend this to a V{dp\sT^ K p a ,\,, rj). See also Wolpert and Ickstadt (1998b) for some possible ideas 
in the general setting. I believe that the mixture representations given in the next section may also be useful 
in this regard. 



5 Analysis of a Scaling operation which arises in Brownian Ex- 
cursion theory 

The previous section describes applications of various exponential change of measure operations. As shown 
from Lemma 2.1 this operation results in a change of measure from a Poisson process with intensity v to 
another Poisson process law with intensity v. An important aspect of that is one can still apply directly 
Lemma 2.2 to the transformed Poisson law, which yields the various results in the previous section. In 
particular this operation transforms processes p which do not admit moment measures to ones which do. 
An important example is the stable law which is transformed to a form of weighted generalised gamma 
process. Another important operation, besides the exponential change of measure, is a type of scaling, which 
arises for instance in Brownian excursion theory [see for instance Pitman and Yor (1992, 1997, 2001)]. This 
operation no longer preserves the Poisson nature of N or similarly the structure of the biased models p. This 
in itself does not present a major obstacle as one could still apply Lemma 2.2 to the Poisson law first. The 
law resulting from the scaling operation may not have an obviously understandable form. However, indeed 
hidden in a scaling operation is an exponential form via a gamma integral identity. This identity has been 
used frequently in various contexts. Here, a slightly different variation will be used, where the exponential 
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change of measure idea will be applied internally leading to a variety of interesting consequences. As an 
important special case we look at the PD(ct, 9) model. In general, analysis of the simple scaling structure 
leads to results quite related to Pitman and Yor (1992, 1997, 2001). [See also Perman, Pitman and Yor 
(1992), Section 4]. In particular see Pitman and Yor (1992) Section 3. 

REMARK 15. It will become quite clear to the experts, on excursion theory and such matters, that 
the overlap with Pitman and Yor (1992, Section 3) is hardly coincidental. Although this was not my initial 
motivation. Some of the results given below amplify on Pitman and Yor (1992, Theorem 3.1 and especially 
Remarks 3.3 and 3.4). This section may be viewed as extensions of their Section 3. Given the new results 
I obtain for the PD(a,6), among other things, the present exposition should clearly provide new insights, 
for the experts, into matters of which I myself have no expertise. Again applications of Lemma 2.1 and 
Proposition 3.1 play a fundamental role. 



REMARK 16. In this section it is assumed that r] satisifies the integrability condition in Corollary 3.1 
when g := 1. That is when p is homogeoneous then r\ := cH for some scalar c. For the Dirichlet process 
rj := 9H but is not to be confused with 9 used below to mimic the scaling operation associated with PD(a, 9) 
for a > 0. 

The present analysis yields new mixture representations of random measures N and p. What is most 
important is how they arise within the context of the scaling operation. [The reader should again note Pitman 
and Yor (1992, Remark 3.4)]. This leads to an analogous representations of random probability measures 
defined as, 

(M] p( , MO f s h(s)N(d S ,-) mo 

{ ' V) mOO J s h( u )N(du,y) ■ T 

whose law is determined by an appropriate law on N. That is either a Poison law or a scaled Poisson law to 
be described below. In addition various characterizations of the the posterior distributions of N,fj,, and P are 
obtained. These results are applied to obtain general identities and representations for the two-parameter 
family with parameters < a < 1, 9 > —a. This extends an identity given for the case PD(a,a) in Pitman 
and Yor (2001). 

It is known that the general PD a ,g(dP\H) cannot directly be defined via normalization of an independent 
increment process. The exceptions are the Dirichlet process and the Stable law process. Pitman and Yor 
(1997), [see also Tsilevich, Vershik and Yor (2000)] establish the following relationship. Let PD at g(dp\ri) 
denote a law of \x such that its normalisation results in a Poisson-Dirichlet random probability measure, with 
law denoted as VD a ^(P\H). In particular W a fi(dp\rj) := V(dp\p a ,ri) is the Stable process. From Pitman 
and Yor (1997), [see in particular Tsilevich, Vershik and Yor (2000) where this form is taken], it follows that 
for < a < 1, 9 > -a, 

(65) VD afi {dp\ri) = c afi T- e VD afl (dp\ri), 

where c Qi # is a normalizing constant. Pitman and Yor (1997) describe various results concerning the PD{a 1 9) 
law of the (Pi), related to this fact. Notably, Proposition 21, Proposition 22, and Proposition 33. In particular 
they show that if the sequence (Pi) is PD(a, 9) then 



(66) 



f-OO 

PD(a,9) = / PD(a\t) lafi (dt) 
Jo 
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where PD(a\t) denotes the conditional Poisson-Kingman[see section 8] law of PD(a) conditioned on T, 
lu,e(dt) = c^gt^ 9 f a (t), c a fi — E[T~ 9 ] — J^°t~ e f a {t) is the normalising constant and f a (t) denotes the 
density of a stable law random variable which is the distribution of T. Tsilevich, Verhsik and Yor (2000) 
using ( |65| ) in combination with the following gamma identity for 9 > 0, 

(67) r-^J-^Vv-*, 

establish quite remarkably and simply a two-parameter extension of the Markov-Krein correspondence via 
the Laplace functional of a stable law. [Their result will be extended in a general fashion in Section 6]. This 
identity is also used in Perman, Pitman and Yor (1992) and Pitman and Yor (1997) among other places. The 
results discussed above are used primarily to deduce properties related to the normalized process p(-)/T. 
That is, equivalently the PD(a, 9) family. The interest here however is in another characterization of the law 
'Pa,e{dfJ-\i]) which allows more direct usage of it and of course implies results for the normalized process and 



synonymously PD(a,9). The method relies on using the identity (67) conditioning on various transformed 
densities for V rather than T which will lead to a variety of interesting results. This analysis will be applied 
to general processes, p and N, subject to the same type of scaling operation. For all 9 > —a, in particular 
the case —a < 9 < 0, the same identity (^) will be used for each fixed n, 

(68) — - / T n v n e~ dv := 1, 

r M Jo 

where the main point now is to work with T n rather than its reciprocal, and otherwise use the fact that the 
left hand-side of (|68|) is one. Now assuming that 



(69) E[T- 6 \p,rj\:= [ T- e V{dN\p,ri) := [ T- e V{dp\p, rj) < oo 

Jm Jm 

consider the equality of laws, 

(70) V(dN\p, 77, 9) := 9 —,&nd, T(dp\p, 77, 9) := — 

E[T e |p,77] E[T ti \p,ri\ 

For each fixed v, let E[e~ vT \p, 77] := J M e~ vT V(dN\p,rj) denote the Laplace transform of T taken relative 
to V(dN\p,rf). The following relationship will prove useful and allows analysis for the case —a < 9 < 0. 
Suppose that for n > 1, 

(71) £[T>, 77, 9 + n] := T n V(dN\ P> V ,9 + n)= J^^}^ < 00 
then 

(7^ -6M/VI /n T n V(dN\p,r,,0 + n) - T n P(dp\p, V ,9 + n) 

72 V(dN\p, 77, 9) := — — — , and V(dp\p, 77, 9) := — . 

E[T n \p, 77, 9 + n\ E[T n \p, 77, 9 + n\ 

Additionally, denote the laws of P taken relative to V(-\p, 77) and V(dN\p, 77, 9) as V(dP\p, rj) and V(dP\p, 77, 9) 
respectively. When p does not depend on y and 77 := cH, then 

00 

(73) P := Pitzi 

i=l 
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where the sequence (Pi) is independent of (Zi) which are iid H . In that case the analysis of P is in principle 
equivalent to the analysis of (Pi). That is the P models are special cases of species sampling models [See 
Pitman (1996), Hansen and Pitman (2001) and Ishwaran and James (2001a)]. Hence, for instance, the results 
of Perman, Pitman and Yor (1992) and Pitman (1995b), concerning the EPPF etc.; can be applied to this 
setting to obtain information about P. When p depends on y, then one can still represent P in the form 
(f73|). However the independence property between (Pi) and (Zi) no longer holds. It will become clear that 
if interest is simply the marginal distributional properties of the (Pi) then the results of Pitman, Perman 
and Yor (1992) and Pitman (1995b) can be applied using rather than p. At any rate, a different analysis 
will be used here which is based primarily on information contained in the random measures N and p which 
will yield relevant information about the (Pi) etc. This is useful even in the species sampling case where the 
EPPF may be intractable or does not easily convey information about P. 

REMARK 17. Although the emphasis seems to be on the scaled laws V(- \p, r\, 9+n) this is only partially 
the case. In particular it should be clear that the negative moment conditions (|6^) and ( f7l|) do not hold 
in general. This is the case for a gamma process, which could be denoted as PQ^g(dp\ri) 1 and hence it is 
not a proper V(dp,\p,r] 1 9). The forthcoming discussion investigates properties of general N and p based on 
manipulation of both ( |67|) and (|68|). The results below related to V(-\p, r),9 + n) hold provided ( |69| ) and ( |7l| ) 
are true. 



5.1 Mixture representations for general processes 

The results follow from the proof of Theorem 5.1 below. 

Proposition 5.1 The follwing identities hold which have various implications. 
(i)For n > 0, 

„ ~>(p) 

(74) 



r, JO 



Y[ Ke] je- vh n) 



which is equivalent to, J °° E[T n \e vh Q,,rf\v n 1 E[e vT \p, rj\dv. 
(ii) Statement (i) implies that there exist a joint distribution ofV,p given by, 



(75) 



ir„,o(dv, p\p,n) 



r(n) 



(hi) Suppose that holds for 9 + n > then, T(9 + n)E[T-( e+n ^\p,i 1 ] := J °° v e +"- 1 P[e- 1 ' T | / 9, rj\dv, which 
identifies a random variable V on (0, oo) with density, 



(76) 



ir g+n (dv\p, 77) := 



v e+n - 1 E[e- vT \p,rj\dv 
T(9 + n)E[T-( e +^\p,r l ] 



Additionally, E[T n \p, r\, 9 + n] := J °° E[T n \e vh Q,i]]'Kg +n (dv). Hence there exists a random variable V such 
that V, p has joint density, 



(77) 



n e+nt e(dv, p|p, 77) 



P[r-( fl +")|p,7 ? ] 
E[T-o\p,n\ 



n(p) 

]Jn ej Je- vh n) 



TT0 +n (dv) 
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(iv) The marginal distribution of p in (|7qj and (\7%l is given by, 

E[T-( e+n )\p,r)] 



(78) 



E[T~ 9 \P,V] 



n(p) 

J'=l 



ir g+n (dv) 



REMARK 18. The formula 7r„.o(p) derived from ( |78| ) corresponds to the EPPF formula given in Pitman 
(1995b, Corollary 6 formula (32)), when h(s) := s e (0, oo) and p is homogeneous. Pitman (1995b) uses the 
identity (|67j ) applied to T~ n . Indeed it will be stated formally that (|7^) are EPPF formulas which can be 
seen as an extension of Pitman's result. 

The next result identifies V(dN\p,rj,9) and in fact arbitrary Poisson laws V(dN\p,rf) as a mixture 
relative to Poisson random measures N* with conditional (given V) laws V(dN\e~ vh p,rj). Moreover since 
the conditional law on N* has moments the result also contains mixture representations of p of the type 
N* + Y?j=i 3jj n ,y* ■ The marginal distribution of N* actually depends on n. 

Theorem 5.1 (Poisson Mixture representations) (i) For 9 > 0, the law V(dN\p,9,r/) defined in ([%[) is 
expressible as the following mixture, 

I* oo 

(79) V{dN\p,e,n):= V{dN\ e~ vh p, v)n e (dv\p, r,). 

Jo 

This indicates that N\V = v has a Poisson law V(dN\e~ vh p,n) with intensity e~ vh ( s ) p(ds\y)r/(dy) and V is 
a random variable on (0, oo) with density irg(dv\p,r)). Equivalently the Laplace function of N is, 

I* oo 

(80) £ N (f\p,ri,9):= C N (f\ e~ vh p,n)n e {dv\p, r?) 

Jo 

[Compare statement (i) with Pitman and Yor (1992, Remarks 3.3, 3.4)] 
(ii) For each n > 0, a Poisson law P(dN\p,n) can be represented as, 

(81) V(dN\p,r)) :=J2 [ P(dN,dJ,dY*\e- vh p, 77)7^,0 (<fo,p|p,r?) 
That is from, ^3^), the random measure N can be represented as 

n(p) 

(82) N* + "£5 (J . niY;) , 

3=1 

where givenV,p with lawir n .o(dv,p\p,r]), N* is conditionally independent of 3, Y*, with distribution V(dN\e~ vh p,n,p). 
The random variables (J, Y*) given V, p have distribution P(dJ, dY* | e~ vh p, 77) as in fu a). 
(hi) For 6 + n > 0,this includes —a < 9 < 0, the law V(dN\p, 77, 9) is equivalent to, 

(83) V / P(dN,dJ,dY*\e- vh p,r))Tr e +n,e(dv,p\p,V)- 

That is, N can be represented as J&| ) except that the distribution ofV,p is ■Ke+n,e{dv,p\p,n). When 9 := 
statement (Hi) reduces to statement (ii). 
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PROOF. For the result in (i), it suffices to evaluate the Laplace functional of V defined in (|7(i|). That 

is, 

E[T- e \p,rj\ f e- N ^P(dN\p, V ,e):= f e - N ^T- e V(dN\p,i 1 ). 
Now apply the identity ( |67| ) to see that the right hand side is equal to 
(84) v o-i E[e -N(f) e -vT Mdv _ 

An application of Lemma 2.1 and dividing through by E[T~ \p, rj\ now show that the Laplace functional of 

P(dN\p,r],e) is, 

[ ' Jo 1 ' P,Vl E[T-e\p,ri\T{e) ' 

as desired. For (ii), use fl6q) as follows 

(86) T{n)C N {f\p,rj) := E[T n e - N ^e- vT \p, V ]v n - 1 dv. 



Noting that T n := Jy n n"=i M(^^i)> the result follows immediately by an application of (ii) in Proposition 
3.1 and Corollary 3.1. Statement (iii) follows by first applying (i) to f > {dN\p,rj,9 + n) in (|72|) and then 
applying (ii) to V(dN\e~ vh p,r]). m 



REMARK 19. A clear distinction between a Poisson law V(dN\p,rj) and the law on N defined as 
V(dN\p,rj,9) was used above. This distinction was made because the conditions and techniques used to 
derive the results were slightly different. Hereafter, for some brevity, the notation 8 will be used for all 
objects. When 9 is set equal to zero (when applicable) this will correspond to results for the Poisson based 
laws of TV, p, and P and their corresponding mixing laws. 



Corollary 5.1 (Mixture representations for \i) The results (i),(ii), (iii) in Theorem 5.1 imply analogous 
results for V(dp\p, rj, 9) and V(dp\p, rj). 

(i) For 9 > 0, 

V(dp\p, V ,9) := / V(dp\e' vh p, V )ng(dv\p, V ) 
Jo 

(ii) For n > 1 and 9 + n > 0, including —a < 9 < 0, Statement (ii) and (iii) in Theorem 5.1 implies that, 
for each n > the random measures p with distribution V(dp\p 1 ri) or V(dp\p,i],9) can be represented as, 

n(p) 

(87) p,* + J2KJj,n)S Y ;, 

3=1 

where p*\V,p is V(dp\e~ vh p), and (Jj. n ,Y*)\V, p are conditionally independent of p* with distribution 
P(dJ, dY* \e~ vh p, rj). The distribution ofV,p is TTg +n ,s(dv, p|p, rj). 

(iii) Equivalently these results yield the following expressions for the respective Laplace functionals. For 9 > 0, 

poo 

(88) £^g\p, V ,9):= C^e^ p)w e {dv\p, rj), 

Jo 
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and for 9 + n > 0,n > 1, 
(89) 

where 



£v(9\p,V,9) -=y2 / C-n{g\ erVh P,'n,v)Ke+n,e{dv,v\p,ri) 1 
„ Jo 



„ n (p) 

3=1 



Setting 9 — in ( pfy yields an identity for the Laplace functional of V{dpb\p,rf). 
Now mixture representations for P are given. First set, 

f0>) 



and define a random probability measure as 

M*(0 . /**(•) 



p*(0 



n*(y) ' t* 



Proposition 5.2 (Mixture representations for random probability measures) 
(i) If 9 > i/ien ifte random probability measure, 



(90) 



P{dP\p,r), 



T(dP\e~ vh p,ri)Mdv\p,v) 



(ii) J/ P is "P(cLP|p, 77, 9) or V(dP\p, rj) then for each n > 1 and 9 + n > 0, if is equivalent in distribution to 
the random probability measure 



(91) 



(') + (1 - Pn) — , 



E"i p i^(^>) 



urtf/j distribution specified by statement (ii) Corollary 5.1. 

5.2 Duality of mixture representations and posterior distributions 

In this section Yi, . . . , Y n \P are iid random variables with distribution P. That is, this implies the joint model 
for Y\P, is nr=i p ( dr 0- Thc law of P is either V{dP\p,rj) or V(dP\p,ri,6). The interest is in obtaining 
posterior distributions of N, p and hence P and relevant information about the marginal structure of Y. 
Define a conditional distribution of V|Y as, 



(92) 



ir g+n (dv\p,Y*) oc 



n(p) 

n«e Jin (e-" h p|Y/) 

3=1 



ir g+n (dv) 



when p does not depend on y, then the distribution of V|Y only depends on p and (B2) reduces to 



(93) 



n g+n (dv\p, Y*) oc 



n(p) 

3=1 



7r e+ „(dw) 
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Theorem 5.2 (i) The marginal distribution ofY is 



(94) 



7r(dY) 



"(P) 

JIP(dl7| e -«V»7) 

J'=l 



T/iis implies that the quantity i:e+n,9(p\p, ff) is an EPPF. When p does not depend on y then the marginal 
distribution of Y is expressible as 

n(p) 

(95) n(dY) := 7r 9+n>e (p\p, v) ]J H(dY*). 

3 = 1 

(ii) Statement (i) combined with Theorem 5.1 imply that the posterior distribution of N , fi given Y is identical 
to the mixtures, 

(96) / V(dN,dJ\e- vh p,r),Y*)7r e+n (dv\p,Y*); f V(dp\ e~ vh p, -q, Y*)we+n(dv\p, Y*) 
respectively. 

(iii) Statement (ii) implies that the posterior distribution of P is determined by either of the laws in (jff6[). 
Combined with the mixture representations this implies that the distribution of P given Y is equivalent to 
the distribution of the random measure 



(97) 



p* n P*(.) + (l-p* n ) 



^ES? h(J Jtn )6y{.) 



Ejy 1 KJ Jin ) 



where the distributions of p* and (Jj t n) givenV ,Y is specified by V(dp\e p,T],Y*) and V\Y is ug+ n {dv\p, Y*) 



PROOF. Given the mixture representations in Theorem 5.1, the result follows by an appeal to Fubini's 
theorem which identifies the posterior laws of N,p, P as mixtures relative to the marginal distribution of 
Y. That is, for instance V{dN) :— Jy 7l P(dN\Y)ir(dY). (More formally one could evaluate the Laplace 
functional of N on both sides) . Hence it suffices to identify the marginal distribution of Y and then apply 
a simple algebraic rearrangement in the mixture representations of N given in statements (ii) and (iii) of 
Theorem 5.1. The identification of 7r(cZY) is straightforward. ■ 

The conclusion that TTe+ n ,9(p\p, v) is an EPPF perhaps requires further discussion as the technique I 
used may be a bit unfamiliar. Essentially from the theory of exchangeability if Y±, . . . , Y n \P are iid P then 
the marginal distribution of Y = ( Y* , p) is exchangeable. Hence once the unique values Y* are exposed an 
integration with respect to r\ leaves only a marginal dsitribution of p which must be an EPPF regardless 
of whether or not the unique {Y*, . . . , Y*, p *} are iid H . That is regardless of whether or not P is a species 
sampling model as described in Pitman (1996). What is lost is the 1-1 correspondence between (p, H) and the 
random probability measure P. For instance, it is conceivable that the EPPF PD(p\ct, 0) could be embedded 
in a model P which is not P a g(dP\H). In that case the Y{ , . . . , Y*^ cannot be iid H . In other words an 
EPPF can always be found by working with a P model, finding the joint marginal distribution of Y, and 
then marginalizing over the unique values. This is obvious when P is a species sampling model. However, it 
is a simple matter to verify the addition rules given in Pitman (1995a, b, 1996) for an EPPF by applying the 
Bayesian idea of a prediction rule. In particular for each n evaluate, 

E[P{y)\Y] := E[pl\Y]+E[{l-p* n )\Y] := 1. 

Proper manipulation of the middle expression will yield the obvious rules. 
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5.3 Results for PD(a, 6) 

A description of the VD ai $(dp\rj) laws is now given. Throughout this section the notation p^, Tl etc will be 
used to denote the (conditional law) of p, T depending on a random variable L. 

5.3.1 Distributional properties of VD a j(dp\r)) 

Corollary 5.2 (i) If N is V{dN\p a ,rf) and h(s) := s, then for 9 > 0, P(dN\p a ,r],9) is such that N\V = is 
a Poisson random measure with intensity, 

(98) p a .v(ds)ri(dy) := e~ vs p a {ds)n{dy) 
corresponding to the Levy measure of a generalised gamma process. The density of V is 

(99) r e (dv\p) := 1 



-1-Kv a 



c a .eT{9) 



By a change of variable the distribution of L := V a has a gamma distribution with parameters (£, K), That 
is, L is Q(^,K). The factor K is determined in part by the total mass of -q. It can be dispensed with by 
re-scaling. 

REMARK 20. Now the connection to Pitman and Yor (1992, section 3, p. 335-336) should be more 
transparent. [See also Pitman and Yor (2001, Theorem 3)]. The exponential law arises by setting K = l(or 
by rescaling L) and the choice of 9 = a. 

Proposition 5.3 Corollary 5.2 implies that the distribution ofVD a fi{dp\rf) with respect to the mixing dis- 
tribution of L is, 

POO 

(100) VV a ,e(dp\r,) := / P(dp\p a , L1/a , V )g(dL\9/a,K) 

Jo 

In other words p\L is a generalized gamma random measure with Levy measure p a L i/ a (ds)r](dy) and L is 
a gamma random variable with parameters (—,K). When K — 1 and 9 — a, L is a standard exponential 
random variable. The Laplace functional of r PV a ^{dp\rj) can be expressed as, 

(101) / e-^VV a , e {dp\ri) := f° C^g\L,a)g(dL\9/a, K) 

Jm Jo 



where 



is the conditional Laplace functional of p given L. 

(ii) Furthermore, suppose that conditioned on L, p is multiplied by L x l a . Then the unconditional law of 
L x / a pL is given by its Laplace functional, 



f°° r r 

(102) E[e- Ll/a ^):= C^L 1 '* g\L,a)g(dL\9 / 'a, K) := (g{y) + if H{dy) 

Jo Uy 

(iii) Statement (ii) implies that the distribution of L x l a TL is Q{6). 



-e/o 
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REMARK 21. Statement (iii) should be compared with T of Pitman and Yor (1997, Proposition 21). 
Note that the explicit expression for the Laplace functional in (ii) is obtained via Brix's (1999) expression 
for the generalised gamma measure. 

Now noting that, 

n(p) n(p) 

(103) [] Kej Je~ vh n) := w -(»-«(P)-)^(y)"(P) J] T(e hn - a), 

i=i j'=i 

the results below are easily deduced. 

Proposition 5.4 (i) For all 9 > —a and n > 1, i/ie VD a ^(dfj,\r]) law is representable as the random 
measure, 

n(p) 

(104) + E J i.»%(-) 

j=l 

where /i* given a random variable L is a generalised gamma random measure with intensity p a L i/a(ds)rj(dy). 
The (Jj, n ) given (L,Y*) are independent of /i* with respective Gamma distributions G(ej. n — a, L 1 /"). L 
given p is 5(n(p) +9/a, K) for some constant K. In particular by cancellation one can set K = a or K = 1. 
Conditionally independent of L, {Y£ , . . . , Y*^} are iid r)(-)/ri(y) := H(-). 

(ii) The distribution ofp is the EPPF, PD(p\a, 9). In addition the marginal law of \i* |p is V 'D a ,e+n(p)a{dp\ri) . 
which follows since similar to (101) its Laplace functional is 

/-■oo 

(105) / C^g\L,a)g(dL\n(p) + 9/a,K) 

Jo 

(iii) Hence the random probability measure 



given p is V D a .e+ n (p)a(dP\H) . Denote the random probability measure with this law as P a .e+ n (p)a- 

(iv) Given L,p, the random variables Gj. n := L^/ a Jj <n are independent Q{ej. n — a) independent of L and 

fi* . Moreover given p the distribution of l}l a [i* L is given by the Laplace functional 

- -(0+n(p)a)/a 

(g(y) + l) a H(dy) 



id given p has distribution Q(9 + n(p)a) 



REMARK 22. Suppose that K = 1, and n = 1, then in particular for the stable case Wa^di^n), it 
follows that L is exponential (1). For the VD aia (dp,\rj), L is G(2, 1). 

5.3.2 Identities for PD(a, 9) 

The propositions above are now applied to derive an alternate representation for the distribution of the 
general PD(a, 9) and related models. This, in particular generalizes the right-hand side of the construction 
of a PD(a, a) model given in Pitman and Yor (2001, Example 8, eq. (33)) to previously unknown ones for 
the general PD(a, 9) model. 
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Notice that using the change of variable u = vs, e vs p a {ds) transforms to the Levy measure 

(106) v a e- u p a (du) 
which yields the equivalence of the sets 

/•oo />oo 

(107) {y : / e~ vs Pa (ds) < x} := {u : / e~ s p a (ds) < x/v a }. 

J y J u 

Define, 

(108) A _1 (x) := inf{u : / e~ s p a {ds) < x}, 

J U 

and set Tj := f° r C^i) a collection of independent standard exponential random variables. In additon 

define for all 6 > 

oc 

(109) sv^^A-^iyz) 



where L is a Q{9/a) random variable, independent of (Ei). Now using the change of variable L = V a , (107) 
and Proposition 5.3, 5.4, it is easy to see from an application of Khintchine's (1937) Inverse Levy method, [see 
also Ferguson and Klass (1972), Wolpert and Ickstadt (1998b), Sato (1999), Rosinski (2001), and Banjevic, 
Ishwaran and Zarepour (2002)], that the following identities hold; 

Proposition 5.5 (Distributional representations for PD(a,9)) Choose < a < 1, 

(i) then for 9 > 0,the distribution of the sequence 

(110) (A-^/LVS^; j = 1,2,...), 

is PD(a,9). 

(ii) Let (Zj) denote an iid sequence with distribution H chosen idependently of L and (Ej). If 9 > 0, then 
equivalent to (i), the random probability measure, 

(HI) ^(-):=E A T j/£ W ) 
is PD a:9 (dP\H). 

(iii) For all 9 > —a and n > 1, a VV a ,e(dP\H) random probability measure is representable as, 



(112) P M (.) := „( P ) ^+n(p)a(0 + pJ 

Zj n(p)+0/a T 2-fj=l Zj n(p)+6»/a T 2^ = 1 J .J> 

where, for clarity, ^n( p )+e/a : = Ej=i A (rj/L). L|p is gamma distributed with parameters (n(p) + 9/a, 1), 
(Jj.„)|L,p are respectively independent Q{e^ n — a, L 1 /") and (Y^*)|p are iid 77. TVoie that {Jj, n ) are n °t 
independent of Sn( p )+e/ Q 

Now to obtain another representation of P a ,e, which also serves to directly recover Pitman's (1996) 
description of the posterior distribution. Notice from Proposition 5.3 and 5.4 that given p the following 
equivalence in distribution holds for each n > 1; 

q 13 n L 1/Q T L ._ G g+n{p)a 



ri/aT, x ri/aV ,i W T C \ -X- V"( p ) C ■ 
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[This is also true for n=0 by Proposition 5.3] The key point is that the quantity above is independent of the 
mixing distribution on L for all n. Moreover, L 1 /"!^ := Gg +n ^ a is Q(0 + n(p)a) and independent of the 
gamma random varibles (Gj, n ) as defined in Proposition 5.4. Hence the following result, 



Proposition 5.6 For all 8 > —a, 

(114) P a ,e{-) ■= PnP a ,8+n(p) a (-) + (1 -Pn) 1 

where 

Pn — 



E ;i p M,»%(-) 

v^™(p) SI 



+ E"=l G 3-n 

Given p the quantity above does not depend on the mixing distribution L. Hence given Y = (Y*,p) the pos- 
terior distribution of a P which is VD a ^(dP\H) is immediately seen to be equivalent to the random measure 



on the right of (114) when (Y*) and p are fixed. This corresponds exactly with the posterior distribution 



described in Pitman (1996) 



REMARK 23. It is certainly obvious that one could use Pitman's (1996) posterior characterization 
to obtain the simple mixture representation in proposition 5.6. The main point however is really how the 
previous descriptions (which are less obvious) led up to this result. Moreover, none of the arguments appealed 
to the stick-breaking representation of VD aj g(dP\H). 

REMARK 24. The analysis of the PD(a,6) models revealed various independence structures via a 
simple transformation. In general this will not be the case but there are certainly many instances where an 
appropriate transformation of the (Jj. n ) will render them independent of fi* and the mixing distribution 
on V. Such an operation should simplify the analysis. One might try this with the intensities described in 
Pitman and Yor (2001). 

REMARK 25. I wonder what if any interpretation does an adjustment to the left hand-side of Pitman 
and Yor (2001, Theorem 3, eq. (19)) have when eo is replaced by what one might guess from Proposition 5.3 
to be a Q(6/a) random variable 

5.4 Results for the Dirichlet Process and generalised gamma process 

Results for the gamma process with shape rj(-) :— 9H(-), that is PDQ_g(dfi\rj), follow by using po.i hi place 
of p a . In this case T is G(0), and 

n(p) n(p) 

(115) [] ( e ~ vh ^ : = f 1 + v)- n e n ^ [] r( ei , n ). 

3=1 J=l 

Which yields readily, 



Proposition 5.7 

(116) 



7r n;0 (dw, p) := PD(p\9)Tg. n (dv) 
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where 
(117) 



T9;n(dv) 



r(6> + n)(i + vy {n+e) v n - 1 dv 
W) 



which implies that V and p are independent. Note the density Te- n (dv) is well defined provided 9 > 0. 

(ii) The (Jj jT1 )|V,p are independent G(ej, n ,l + V) and the distribution of p*\V,p is a (simple) weighted 
gamma process, i.e. has intensity po,i+v$-ff ? an d is independent of p. 

(iii) Hence it follows that given V and p 



(118) 



(V + l) 



n(p) 
3=1 



is a mixture of gamma processes independent ofV. That is additionally given Y*, the measure in l \llq ) is 
a gamma process with shape OH(-) + 5Zj=i e j,n^Y*(')- This fact serves to recover the well-known result of 
Ferguson (1973) for the posterior distribution of the Dirichlet process. 



REMARK 26. It is not so surprising that the gamma/Dirichlet model is such that the mixing distri- 
bution V and p are independent. It is also perhaps true, given the properties of PD(9), that this is the only 
species sampling model with this property. 

The arguments above may be applied to the generalised gamma model with intensity p a .b- In this case, 
from section 3, it follows that, 



(119) 



n(p) n(p) 

H Kej Je- vh Q) := [v + 6 )-("-«(p)<*)0«(p) Yl r(e ijn - a) 
j=i i=i 



where it is assumed that r/(y) := 9. This leads to, a joint density of V, p specified as 

(v + 6)-"+™(P)%«-ie-[( l '+ b )°- bQ ]^ 



(120) 



n(p) 

n r ( e ^« - a ) 



r(n) 



In addition the (Jj n )\V,p are Q(ej n — a, b + V) and n*\V, p is a generalised gamma process with Levy 
measure p a ,b+v- Hence (b + V)Jj, n are Q{e.j^ n — a). The Laplace functional of (V + b)p*\V, p is 



(121) 



ii-v+b)" J [(g(y)+l) a -l]r,(dy) 



law 
(122) 



REMARK 27. In order to incorporate larger classes of models for P one could use a weighted Poisson 

w{N)V{dN\p, rf) 



Q{dN\p,n) 



E[w{N)] 

for an arbitrary integrable function w. This will be used in Section 8. 
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6 Distributions of joint linear functionals of P; variations of the 
Markov-Moment problem 

This section is a continuation of the previous one. Here, it is shown that the joint Cauchy-Stieltljes transform 
of linear functionals of P, which are V(dP\p, 77, 6) and V(dP\p, rj), is equivalent to expressions involving the 
Laplace functional of random measures V fty- The precise meaning of V \iy will be clear from the context 
below. The method of proof, given the results in the the previous section, is an easy extension of the beautiful 
approach used by Tsilevich, Vershik and Yor (2000) for the Dirichlet process and the general V a ,e(dP\H) 
family. The results given here represents the most general ones that I know of. More importantly the explicit 
relationship between P and Vpv is a new insight. See Kerov (1998) for many implications of this type of 
result. 

Let fi denote real- valued functions on y and define Pfi = j y fi(y)P(dy) for I = 1, . . . , q. In addition let 
zi for I = 1, . . . , q denote non-negative scalars. In this section the calculation of the following transform(in 
relation to Laplace functionals of Vfiy) is discussed; 

(123) L i + ^L,ps, f{iPM 

for all 6 > —a. Again when 6 = this coincides with the V(dP\p, 77) laws. The quantity characterizes the 
joint distribution of (Pfi, . . . , Pfq)- Kerov and Tsilevich (1998) in the case of V a ,e{dP\H) used combinatorial 
arguments to obtain the moment expressions in the case of the Dirichlet and two-parameter models to yield 
extensions of the Markov-Krein identity for (Pfi, ■ ■ ■ ,Pf q ). Their results extend the work of Cifarelli and 
Regazzini (1990) for the case of the distribution of the mean functional, J yP(dy), when P has a Dirichlet 
process law. The mean case is also discussed in Diaconis and Kemperman (1996) where in addition the result 
for the joint distribution of functionals like (Pfi, ■ ■ ■ , Pfq) was proposed as an open problem. Tsilevich (1997) 
establishes the case for the mean with respect to the general two-parameter processes. These results used hard 
analytic tcchiqucs which would not be easily extendable to a general scenario. However, recently Tsilevich, 
Vershik and Yor (2000) devise a beautiful simple proof of the corresponding result in the case of the Dirichlet 
process and the general two-parameter extension via Laplace functionals. Given the results in section 5.1 it 
is a simple matter to extend their result to the general class of probability measure V(dP\rj, p,6). That is, 
following closely their approach, relationships between and the Laplace functional of V/iy are established. 
The results below follow by rewriting 

1 _ T 

and applying Corollary 5.1. 

6.1 Joint Cauchy-Stieltjes transforms and Laplace functionals 
Proposition 6.1 For 6 > 0, 

(124) / (l + J2ziPf] V(dP\p,r 1 ,6):=C Vllv (y2z l f l \p,7 ] ,6) 
Jm ' \ t=i J 1=1 



where 



1 poo 1 

ZvuvC^zifllPiV,^) ■= / £n(v^2zifi\e ^ vh p,r])Tr g (dv\p,ri) 
1=1 Jo 1=1 



38 



Poisson Process Calculus 



Proposition 6.2 For 9 + n > and n > 1, 

(i) 



£(dP|p,»7,0) 



/•oo Q 

P j ° i=i 



(ii) TTie expressions in (i) are equal to; 



i> poo 9 

/ c A v y2 z ifi\ e ~ vh P^,P,^*)^e+n,9{dv\p,T],p,Y*) 

p ^ n(p) ly° 1=1 



7r(dY) 



which indicates that the posterior Cauchy-Stieltjes transform. 

-(e+n) 



e,Y) 



is equal to, 
(125) 



r oc 1 

/ z ifi\ e ~ vk P> V, P, Y*)ir g+ni g(dv\p, r), p, Y* 

J ° 3 = 1 



Note again the various relationships to the (posterior) laws ofVfj,y- 

Now setting g(y) := YH=i z ifi(v) m the Laplace transform of L x / a p,L in statement (ii) of Proposition (5.3) 
yields, 

- q ° 1 ~ 6/a 

/ £>/,(!/) + 1) H(dy) 
Jy i=i 

Hence the result of Tsilevich, Vershik and Yor (2000) for the V a ,e{dP\H) family is recovered. However the 
relationship to L}l a p^ is not noted in their work. 



6.2 A remark on moment calculations 

As mentioned previously, Kerov and Tsilevich (1998) used nontrivial combinatorial arguments to calculate 
the joint moments of (Pfi, ■ ■ . , Pf q ) in the case of P a ^e(dP\H). Here similar to Ishwaran and James (2001a) 
for species sampling models it is demonstrated that one can easily obtain the relevant moment calculations 
by using Theorem 5.2. This calculation will only be presented for the case where p does not depend on j/, 
The task is to calculate 



E 



l=i 



M 



1 ni . 

nn / Myu)p(dy l 

i=i i=i 



P(dP\p,T),6) 



Now analogous to ( [ill) an application of Theorem 5.2 yields the result 



(126) 



E 



i[{p(fi)T i 



1=1 



^7r e+n , e (p|p,77) n 



1=1 



H{du) 
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7 Posterior Calculus for Extended Neutral to the Right processes 

In this section I focus on the concept of neutral to the right processes(NTR) originally proposed in Doksum 
(1974). The Dirichlct process is the most notable member of this class. Here, a new natural extension 
of the NTR concept to more abstract spaces is given. It is then shown how Proposition 3.1 can be used 
to yield the most transparent and simplest posterior analysis of such models. This includes the case of 
survival data models subject to right censoring when a NTR prior is used or synonymously when Levy 
process priors are used to model the cumulative hazard. Additionally, using Proposition 3.1 a change of 
measure formula is established which relates Beta/Dirichlet processes to their more complex Beta/Bcta- 
Neutral(Stacy) generalizations. 

REMARK 28. Doksum (1974, Theorem 3.1) establishes essentially a 1-1 correspondence between NTR 
processes and exponential functions of subordinators. See below for explict details. This fact seems not to 
be widely noticed by probabilists investigating problems where models under the latter description arise. 
One consequence is that the calculus that is described below for NTR's can be exploited in other areas 
besides Bayesian nonparametrics. Here I will omit the drift component. It is a simple matter to make 
adjustments starting from the obvious modification of Lemma 2.1. (see Remark 2). Doksum (1974, Corollary 
3.2), establishes the almost sure discreteness of NTR's under the condition that the drift component is zero. 

REMARK 29. The notation A will be used in this section to denote a random cumulative hazard 
measure. The dependence of quantities F , A on p, r\ will be suprcsscd. The arguments (s),(t) will be used 
to denote time as is usual in survival analysis. The argument (u) plays the role of (s) in the previous sections. 

First the orginal definition proposed by Doksum is given 

DEFINITION 1. (Doksum (1974)) A random distribution function F on the positive real line is said 
to be neutral to the right if for each k > 1 t\ < t 2 . . . < tk, there exists non-negative independent random 
variables V\ , . . . , 14 such that the vectors satisfy, 

k 

(127) £{(F(ti), F(t 2 ) - F(h), F(t k ) - F(tk-i))} = Cm, V 2 (l - V{), . . . , V k J](l - V±))}, 
where L denotes the law. 

In the special case where F is a Dirichlct process with shape OFo(-) then each increment F(th) — F(tk — 
1) is B(6Fo((tk-i,tk\); 0[1 — Fa((tk-i,tk])- Doksum discusses various equivalences and implications of this 
definition. From Theorem 3.1 of Doksum it follows that F is a NTR process if and only if for t > 0, 

(128) S(t) = 1 - F(t) = c- z(t \ 

where Z is an increasing Levy process satisfying Z(0) — and lim^oo Z(t) = oo . The analysis here will 
consider subordinators Z without a drift component. In other words Z, is a completely random measure on 
(0, oo) with associated intensity p z (du\y)rj(dy) for (u,y) e (0, oo) x (0, oo). Ferguson (1974) shows that a 
Dirichlet process with finite shape measure, 6F (-), results if 

(129) p z (du\y)ri(dy) = -J— c -^ F «(y^dueF (dy) 

1 — e u 
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It follows from the theory of product integration that an NTR process can also be represented as 

(130) S(t) = J^(l-A(tfu)) 

u<t 

where A denotes a cumulative hazard which is further modelled as a completely random measure with 
intensity p\(u\y)dsr)(dy) for (u,y) £ (0,1] x (0, oo). The symbol J[, denotes the product integral which has 
played a primary role in survival analysis. In particular the Kaplan-Meier estimator for S, Kaplan and Meier 
(1958), is obtained by replacing A by its empirical counterpart, the Nelson-Aalen estimator. See the text 
by Andersen, Borgan, Gill and Keiding (1993) for further elaboration. Gill and Johansen (1990) discuss in 
detail the properties of the product integral. The product integral is also expressible as, 

(131) TT (1 - A(dv, y)) = exp(-A c (t)) A d (v)) , 

[o,t] M 

where A c denotes the continuous part of A. Suppressing the dependence on p, rj, E[A(t)] = A (t) where A 
denotes a prior cumulative hazard specification. It follows that, 



(132) E[S(t)} := 1 - F (t) = JT(1 - E[A(du)}) 



u<t 



The restriction of the jumps of the process to [0, 1] ensures that A is an element in the space of cumulative 
hazards and hence S is a proper survival function. Hjort (1990) first proposed working directly with Levy 
priors on the space of cumulative hazards which is more in line with the frequentist counting process analysis 
of event-history models [See Aalen(1975, 1978) and Andersen, Borgan, Gill and Keiding (1993)]. Hjort (1990) 
shows that if A is specified to be a Beta process then it is a conjugate model with respect to right censoring. 



Hjort (1990, section 7A), under a Beta process specification for A in (130), also defines a class of generalised 
Dirichlet processes on 1Z + . He shows that the Dirichlet process is a special case of this model by setting 
c(s) = 6Fq([s, oo)). In summary, Bayesian nonparametric methods for the simple survival setting subject 
to censoring have been discussed following the framework of Ferguson's (1973, 1974) (see also Freedman 
(1963) and Fabius (1964)) Dirichlet process in the works of Doksum (1974), Susarla and van Ryzin (1976), 
Blum and Susarla (1977), Ferguson and Phadia (1979), Lo (1993), Doss (1994), and Walker and Muliere 
(1997) among others. The methods discussed above operate by placing a Dirichlet or more general NTR 
prior on the unknown survival or distribtuion function. An alternative but essentially equivalent approach 
involves working with priors on the cumulative hazard measure discussed in Hjort (1990), Lo (1993) and most 
recently Kim (1999). However, unlike the simplicity of the Dirichlet process discussed in Ferguson (1973, 
1974) for complete data models, the technical aspects of these models appear to be formidable. Moreover, the 
technical arguments used do not easily extend to more complex settings. In particular, the prior to posterior 
characterizations given in Ferguson and Phadia (1979)(see also Doksum (1974)), are only developed for 
distribution functions on the real line. In addition very little is known about the marginal and partition 
based structures. It will be shown that an alternate representation makes the calculus for NTR processes 
indeed straightforward. 

Note importantly that there is a 1-1 correspondence between each Z and A. Formally, the Levy measure 
of Z arises as the image of pjy(du\y)r](dy) via the map (u,y) to (— log(l — u),y). For further discussion 
see Dey (1999) and Dey, Erickson and Ramamoorthi (2000). An important consequence, which has not 
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been exploited in this context, is the following identity, which holds in distribution for each n where N is 
V(dN\p A ,rj). Define for v > 0, 

(133) / t ,_(u,y):=-/{t>> V }log(l-tt) 
then, 

(134) S(v-):=c z ^ :=e- N ^-\ 
where the law of N is V(dN\pA, rj). 

7.1 Definition of Extended NTR processes 

Suppose that (T^, Xi) denotes a marked pair of random variables on 1Z + x X with distribution F(ds, dx). In 
this section an answer is provided to the open question of how to extend an NTR process from 1Z + to more 
general marked Polish spaces. This provides for instance a new class of Bayesian models for multivariate 



survival models. While indeed it is easy to extend Z or A to more abstract spaces, the representation in (12S 



or (130) do not immediately suggest an obvious extension for F. The Dirichlet process which is defined over 
abitrary spaces is a notable exception. However, James and Kwon (2000) recently propose a method which 
extends the Beta-Neutral prior of Lo (1993), and by virtue of the equivalences, the Beta-Stacy process in 
Muliere and Walker (1997) and Beta distribution function discussed in Hjort (1990, section 7A), to a spatial 
setting. A general definition can be deduced from elements of their construction. A definition for F on 1Z + x X 
is facilitated by the usage of its associated hazard measure on TZ + x X. From Last and Brandt (1995, A5.3), 
it follows that such a measure always exists and is defined by, 

(135) A(ds,dx) :=I{t>0} F ^ d *\ 

S(s-) 

In particular, A(ds, X) := A(ds) and hence 



S(s-) :=J[{ 



(136) S(s-) := j | (1 — A(du,X)) . 

An extended NTR is defined as, 

DEFINITION 2. (Extended Neutral to the Right Process) Let A denote a completely random measure 
with intensity p\(du\s)r)(ds, dx) for (u, s,x) 6 (0,1] x (0, oo) x X. Furthermore, the intensity measures is 
chosen such that 



Ao(ds,dx) :— E[A(ds, dx)\p\, rj] := 



l 



up\(du\s) 



ii 



r](ds, dx) 



is a hazard measure. [Denote the marginal cumulative hazard Ao(ds,X) :— Ao(ds)]. Then an Extended 
Neutral to the Right process on 1Z + x X is defined for t > and each B,an arbitrary measurable set in X, 



(137) F(t, B) := J S(s-)A(ds, B ) ■= J J[ i 1 



A(du)) A{ds, B) 



In particular, F(ds,dx) := S(s—)A(ds,dx). The law of F is denoted VN(dF\p A ,rj). The random quantities 
S(s-) and A(ds,dx) are independent for each s and arbitrary x and 

(138) E[F(ds, dx)} := E[S(s-)]E[A(ds, dx)} := e - Ao{s) A (ds, dx) := F {ds, dx). 
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REMARK 30. The definition of an extended neutral to the right process yields, as a special case, a 
class of random probability measures on arbitrary spaces X, defined as 

/>oo 

(139) F{dx):= I S{s-)A(ds,dx). 

Jo 

For instance this expression offers another identity for a Dirichlet process on X. 

REMARK 3 1 . The definition is expressed in terms of A rather than Z extended to 1Z + x X due to the 
natural interpretation of a hazard measure. For instance a description of F(ds, dx) is not easily seen using 
Z. Nonetheless for each A and F in 1Z + x X one can associate a Z on 1Z + x X with again the Levy measure 
of Z arising as the image of p\(du\y)r](dy, dx) via the map (it, y) to (— log(l — u), y). 



REMARK 32. When B = X it is obvious that F(-,X) is an NTR. In addition, due to the complete 
randomness properties of A, F satisfies, 

k 

£{(F(t u B), F(t 2 , B) - F(fi, B), F(t k , B) - F{t k _ u B))} = C{(V hB , V 2 , B (1 - Vx), . . . , V k , B Y[(l-Vi))}, 

j=i 

where for each j, Vj := Vj B + Vj,B c an( i Vi-.B is independent of Vj c f° r * 7^ 3 an d B, C arbitrary. A posterior 
process will be called an extended NTR process if the NTR properties are preserved. 

7.2 Posterior distributions and moment formulae 

Now suppose that one observes n-iid observations (Tj, Xi) from JX™=i F(dTi, dXi) and consider the following 
joint models, 



(140) 



'* I /• •:<//,., /A, 



VN(dF\p A ,r]) and 



YlF(dTudXi 



V{dk\ PA ,rj). 



Here we will work with A, and the equivalent expression 



(141) 



WS{Ti-)K{dT i: dXi) 



i=l 



If one assumes the classical univariate right censoring applied to the marked data as in Huang and Louis 
(1998) then the likelihood under censoring takes the form 



(142) 



.i=i 



where C\, . . . , C rn denote m independent censoring times which indicate that there are random variables 
T n+ \, . . . , T m+n where it is only known that they exceed the respective censored times. Under this assumption 
no information for the marks associated with the censored points Tn+i, . . . ,T m+ „ is available. 

The primary focus will be to deduce the posterior distribtion of both F and A and related characteristics 
of the marginal distribution of (Ti, X±), . . . , (T n , X n ). This will complete the necessary disintegration which 
will allow one to apply both the (extended) models for A and F to a large class of data structures beyond 



univariate right censoring. In fact it will become clear from the form of (141) that analysis of right censoring 
data for NTR is really the same affair as analysis of the complete data model. 
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First, for i = 1, . . . , n define ^-(s) := I{Ti > s} and similarly for I = 1, . . . , to define Yd-is) := I{Ci > 
s}. In addition for i = 1 . . . , n define fo t - satisfying, 

(143) (1 - uf T ^-^ := e -M- («••), 

and for I — 1, . . . , to, let fc t — be defined similarly. Then for each (n, to) > 



(144) 



_ „\Y n , m (») — p-/n,m(«,») 



(l-«) 



where Y ntm (s) := Y^i=i ^Ti-( s ) + XXLi ^Ci-( s )- The quantities / n , m , / are special cases of the functions 
defined in Proposition 3.1 and 3.2. It follows from the identity in (134) that 

n 

(145) S(Ti-) := e-^-) and JJ S(Ti~) = e~ N (M 



where N is P(dN\p\,ri). Now the likelihood (142) can be rewritten as 



(146) 



-JV(/„, m ) 



i=l 



This representation, (146), in combination with Proposition 3.1 and Theorem 3.1 yields the posterior distri- 
butions for A, F, Z subject to possible right censorship. 



Theorem 7.1 The posterior distribution of A given the model ( Hi ) is V(dA\e * n,m pAi V: T, X). That is, A 
is equivalent in distribution to the random measure 



(147) 



n(p) 



where the law of A{-\Y n ^ m ) is V(dA\e f™' m p,rj) indicating that its intensity measure is, 
(148) (1 - u) Yn - m(s) p A (du\s)n(ds, dv). 

The Jj n are (conditionally) mutually independent random variables with distribution, for each j , depending 
on T* , Y n>m (Tj), defined as in (32) as, 



(149) 



JP(J j<n G du\e- f -^p K ,T*) 



- u) Y " MT i ] p A (du\T*) 
K ej , n (e-f".™ PA \Tf) : 



and are conditionally independent o/A(-|l^ jTO ). 

(ii) The posterior distribution of F is still an extended NTR with distribution VN{dF\ e~^ n - m p A , r\, T, X) 
determined by replacing the random measure A with ^14% ) ■ 

(iii) A posterior distribution of Z is equivalent to the law of the random measure 

n(p) 



Z(-\Y n , m ) + ^2(l-e- J ^)S T; , XI (-), 



3 = 1 



where the Levy measure of Z(-\Y n ^ m ) arises as the image of {1 — u) Yn ' m ^ p A (du\s)rj{ds, dx) via the map (u,s) 
to (-log(l -u),s) 
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When m := 0, the results correspond to a complete data model. 

PROOF. First set p := A, and w(A) := e~ N ^ n - m K Now apply statement (ii) of Proposition 3.1. 



REMARK 33. Theorem 7.1 serves to extend the results for the univariate setting to a spatial setting. 
The previous works for the univariate setting do not use explicitly a partition based representation. More 
importantly the method of proof, which is new, is quite transparent. 

REMARK 34. Note also that due to the non-atomic nature (continuity) of i](ds) the quantity Y n ^ m {s) 



in (148) can be replaced by 

n m 

Y+ m {s) := J2 I{* < T t } + £ I{a < d}. 

i=l 1 = 1 

In other words calculations with respect to A(-|Y~„ im ) should be understood to be equivalent to those with 
respect to A(-|Y^~ m ). This does not apply to the distribution of the jumps (Jj, n )- 

Little is known in general about the explicit joint moment structure of Neutral to the Right models. The 
results in Doksum (1974) are rather vague. In recent works expressions for the mean and variance are given. 
The formulae in Proposition 3.2 can be used to easily obtain various equivalent expressions which goes well 
beyond a variance calculation. This is seen by setting Wi(A) := e^ N ^ T *-'> for i = 1, . . . , n , and other obvious 
equivalences. For brevity, I will only present a result which yields the relevant joint structure and EPPF for 
these models. Such results do not appear in the literature mentioned above. 

Define, 



)p A (du\s)r)(ds) 



(150) A n . m (oo) := / / (1 - (1 - u ) y "- (s) 

Jo Jo 

In addition for i = 1, . . . , n and m > define, 

(151) Ai^ m (t):= / u(l-u) Yi -^ {s) p A (du\s)ri(ds). 

Jo Jo 

When 772 = 0, denote Ai^\^ m as Aj_x- Now recall that, 



(152) 



K ej , n (e-f—p A \T* 



Y n , m (TT) 



i>p A (du\T*), 



Proposition 7.1 (i) Form > 0, the (prior) mean catenation for [OI^i 11™=! ) is 



(153) 



E[e- N ^\p A , V ] = e -*».™(«0 = e-A-Vi) 



n 

,i=2 



n 

1=1 



i(Ci, 



(ii) When m — 0, the joint marginal distribution of ((Tj, JQ)) ccm be expressed as 



(154) 



n(p) 

n 

3=1 



ieCj 



K ej Je-^p A \T?) V (dT*,dX*) 



Adjustments for the censored case are obvious. 

(iii) From statement (i) it follows that the corresponding EPPF has the form, 



(155) 



T"(p> 



n(p) 

n 

3=1 



n e-^-iW 
ieCj 



Kej (e-f" PA \T*)r)(dT*) 



James 



45 



Proposition 7.2 Using an algebraic rearrangement the joint marginal distribution and EPPF can be rewrit- 
ten respectively as, 



(156) 

Where, 
(157) 



7r(p|p A , T*) ft F (dT*,dX*)and f n(p\p A , T*) ff F (dT* 

3 = 1 J ™ 3 = 1 

^, n (e-^p A \T*) 



"(P) [TT _ e" 



Ai-i(T') 



3 = 1 



e- A °( T P Kl ( PA \T*) 



REMARK 35. Naturally if one were interested in actually generating such partitions etc, then a mod- 
ification of the algorithm in Section 2.3 can be used. For this one could use expressions for the prediciton 
rule or conditional moment measures which are readily obtainable from an application of Proposition 3.2 
combined with Theorem 7.1. 



REMARK 36. The propositions above combined with Theorem 7.1 yield expressions for the posterior 
disintegrations. It is now straightforward to obtain posterior characterizations for mixtures of extended NTR 
models based on kernels (Ki). This framework allows for much more complex structures than right censoring. 
I have not seen general mixtures of NTR models proposed in the literature. 



7.3 Absolute continuity of general Beta,Beta-Neutral/Stacy models to a canon- 
ical Beta processs or Dirichlet process 

The general construction of the extended NTR models is an extension of (presently unpublshcd work) James 
and Sehyug Kwon (2000). In that work the authors extend Lo's (1993) Beta-Neutral survival and cumulative 
hazard processes to the spatial setting. Lo (1993) derives these based on the following explict construction 
for the hazard 

(158) A T At) ■= f ~ 7f ^ {dS) n v\ ' 

where \x T ,\x@ are independent gamma processes with shape measure r and (3 on 1Z + , which as noted in Lo 
(1993) yields an explicit contruction of Hjort's (1990) Beta cumulative hazard process. James and Kwon 
(2000) extend this definition by simply extending the gamma processes to a spatial setting. Moreover they 
show that such models can be always derived from a Dirichlet process on an even larger space. In other words 
take a two parameter gamma process, say p T ,f3, on 1Z + x X x {0, 1}, such that [i Ti @(ds, dx, {1}) := fi T (ds, dx) 
etc. A corresponding Dirichlet process can be defined as 

(159) P T Ads,dx,A):^ »rAds,dx,A) 



MTi/3 (ft+x,Yx{0,l})- 



Then the extension of James and Kwon (2000) can be deduced from the extended Beta-Neutral hazard 
measure, 

p T ,p{ds, dx, {1}) 



(160) A T!0 (ds,dx) :-- 



// Ti(9 ([s,oo) x A" x {0,1}) 
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where r and f3 are now measures on TZ + x X. If (3 is set to zero in (16C) then the corresponding (extended) 
Neutral to the Right process is a Dirichlet process with shape parameter r which, without loss of generality, 
is set to 9Fq. By virtue of the essential equivalence of the Beta-Neutral process to the Beta-Stacy process of 
Walker and Muliere (1997) and the Beta distribtion function of Hjort (1990, Section 7A), the procedure of 
James and Kwon (2000) includes these models as well. In addition they showed that the (posterior) conjugacy 
of the Beta-type models on 1Z + is preserved under the right censored data spatial model discussed earlier. 
However their technique relied on very special properties of the Dirichlet process which is quite different 
than what has been presented in the previous sections. Here a new result is established which shows how one 
may transform a Dirichlet process or simple Beta process to a more general one via a change of measure. A 
consequence is that the calculus for such models follows from the calculus for the Dirichlet process plus an 
application of Proposition 3.1. 

Recall that a Beta process on 1Z + with parameters c(s) and A${ds, dx) yields a Dirichlet process if and 
only if c(s) := 9Fq([s, oo)). Such a Beta process can be thought of as a canonical Beta process. 

Proposition 7.3 Let V(dA\c, A ) denote the law of a Beta process with parameters c and Ao(ds, dx). Let Z 
denote the corresponding Levy process defined via the map {u,y) to (— log(l — u), y) and define a decreasing 
function f3 on TZ + such that, 

I* oo 

Tp := / f3(v)Z(dv) < oo. 
Jo 

Then the following disintegration holds 

(161) e- T ^P(dA\c, A ) := V(dA\c + (3, A )E[e- T * 3 ] 
where — \ogE[e~ Tl1 } is 

(162) / / (l-(l-u)' 3(s) )^ 1 (l-«) c(s) " 1 ^o(ds). 

Jo Jo 

The proof follows by using the alternate representation 

(163) Tp := e- N Ue) 

where fp(u,s) := — /3(s)log(l — u) and N has a Poisson law corresponding to V(dA\c, A a ). An interesting 
feature of this result is that one can obtain quite easily an alternate expression for the EPPF of a Beta- 
Neutral/Stacy model by first applying the result for a Dirichlet process. That is, 

Proposition 7.4 Suppose that F is a NTR process determined by the Beta process with parameters c* := 
9Fq + (3, then the EPPF is given by, 



(164) PD(p\9) [] / 



(Pi ' x r T( e ,v n + dF ([y,cx)))r(6F ([y, oo)) +/%)) 



T(0F o ([y, oo)))r( ej ,„ + 6F ([y, oo)) + (3{y)) 



F {dy). 



REMARK 37. The change of measure in Proposition 7.3 can of course be extended easily to non-Beta 
processes. In general, the updated law corresponds to a random hazard measure with Levy measure e - -^ 3 p. 
Note that much more general choices of fp can be used via Proposition 3.1. 
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REMARK 38. The correspondence between the Beta-Stacy process of Walker and Muliere (1997) and 
Hjort's (1990, section 7A) process is noted explicitly in Dey (1999) and Dey, Ericson, and Ramamoor- 
thi (2000). The equivalence between the Beta-Neutral process and Hjort's(1990) process was noted in Lo 
(1993). Given the gaps in the literature it is apparent that Beta-Neutral processes are not as well known as 
their equivalent counterparts. This seems to be caused by the title of Lo (1993) which concerns a Censored 
Data Bayesian Bootstrap. 

REMARK 39. NTR processes seem to arise naturally in coalescent theory. See in particular Pit- 
man(1999, Proposition 26) which is not a Dirichlct process. Similar types of processes with drift appear 
in Bcrtoin (2001). 

8 Posterior Distributions of Normalised processes and Poisson- 
Kingman models 

In this section I briefly discuss calculations for probability measures P defined using the weighted Poisson 
distribution Q(dN\p,rj) which are more in line with the results and methods used in Perman, Pitman and 
Yor (1992) and Pitman and Yor (1992) and Pitman (1995b). Here descriptions of pertinent quantities will 
be given in terms of the biased jumps denoted as J. One will see that the forms of the results appear quite 
different than Section 5. For completion the definition of Poisson-Kingman models based on length biased 
sampling presented in Pitman (1995b) is given, 

Definition 8.1 (Pitman (1995b)) Let Pi = (J,-/T) be a ranked discrete distribution derived from the ranked 
points of a Poisson Process with Levy density p (not depending on y) of random lengths J\ > J2 > ■ ■ ■ > 
by normalizing their lengths by their sum which is T. Let (Pj) be a size-biased permutation of (Pi) and let 
Jj = (TPj) be the corresponding size-biased permutation of the ranked lengths (Jj). The law of the sequence 
(Pi) will be called the Poisson-Kingman distribution with Levy density p, and denoted PK(p). Denote by 
PK{p\t) the regular conditional distributioon of (Pi) given (T = t) constructed above. For a probability 
distribution^/ on (0, 00), let 



be the distribution on the space of (Pi), [which is the space of decreasing sequences of positive real numbers 
with sum 1]. Call PK(p, / y) the Poisson-Kingman distribution with Levy density p and mixing distribution 
7 ■ 

Pitman (1995b), points out that knowledge of the conditional law PK(p\t) allows one to generate explicit 
results for distributions PK(p,-f), in particular the corresponding EPPF, by simply mixing over different 
candidate densities, 7(di), for T. The case of the two parameter Poisson-Dirichlct family is explained in 
detail in Pitman (1995b) as mentioned in section 5. 

The introduction of the measure Q(dN\p,rj) serves to incorporate fully the Poisson-Kingman idea while 
maintaining the approach of Poisson calculus at the level of N. Some properties of this class, which shall 
become clear in the next section are now described. When p is homogeneous and w(N) := w(N(-,y)) then 
P is a species sampling model. Additionally when w(N) := g(T) and h(s) := s € (0, 00) then the random 
atoms of P, (Pi) are PK(p,j). In that case 



(165) 




(166) 



7 (dt) := 



g(t)f T {dt) 

E\g{T)\p,rj\ 
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where g is nonegative and integrable but otherwise arbitrary, fa denotes the density of T with respect to 
V(dN\p,rj). In other words the change of measure at N via Q(dN\p,rf) induces the appropriate change of 
measure at the level of the PK measure etc. The results below follow from a straightforward application of 
Lemma 2.2, details are omitted. 

8.1 Posterior characterizations 

In the theorem below it is stated that the posterior law of P|Y, denoted as V(dP\Y), corresponds to the 
law of a random measure defined as, 

"(P) , /T \ 

(167) P n *(-) := R n(p) P(-) + (1 - R n{p] ) J2 ^ m( p)l> T : *?(•). 

3=1 2^.7=1 

where, R n(p) := T/ [t + h(l ' 

Theorem 8.1 Let {Yi, . . . ,Y n } be iid P where the prior law of P is determined by the weighted Poisson 
measure Q{dN\p,rj). Then, the posterior distribution of P[Y is equivalent to the distribution of the random 



measure P* defined in (161) whose law is now determined by the joint probability measure 



n(p) / n(p) \ n(p) 

(168) Q(dNjeds\Y,h)oiw(N+J2^ 3 ,Y;)P(dN\p 1 r i )iT+J2Hsj) ]J h{s f^ p(ds \Y*) 

3=1 \ 3=1 / 3=1 

The marginal distribution of {Y±, . . . ,Y n } corresponds to the un-normalised term on the right hand side of 
(161 ), integrated over V{dN\p,rj), and multiplied by r,{dY*). 



REMARK 40. Notice that the expression above remains complicated even if W(N) is replaced by 1. 
The result is more in line with Pitman, Perman and Yor (1992) and Pitman (1995b). In comparison with 
section 5 the law of the random measure P* is a bit more complex as it is decribed via the biased jumps 
J which have a much more complex (non-independent) joint distribution. Under moment conditions other 
forms of the posterior are easily obtained. 

Now letting 

n(p) 

(169) r n(p) :=T 

3 = 1 

it is quite clear that one can gain further interpretation by adapting the descriptions given in Perman, Pitman 
and Yor (1992, section 4) and Pitman (1995b). Note in particular when n = 1, and w(N) :— g(T) and p is 
homogeneous an evaluation of the expectation of P reveals the structural distributions, 

(170) P ff (J! e ds, Tl e AO = {tl + s)E[g{T)] = -FMTj { l ' 1 l} 

A suitable change of variable yields 

(171) P S (T G dt,Tt e dh) = ^fyf(T 6 dt,Tt e dt) = 9 M^-V(T 1 g dh\T = t) 
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and more generally for each n, 



P 3 (p, J g ds, T n(p) G dt) = gg, P (p, J G ds, T n(p) G rft) . 

The joint distributions P(Ji 6 ds, Ti G dti) and P(p,J G ds,T„( p ) G dt) are given in Pitman (1995b) 
and Perman, Pitman and Yor (1992). One can for instance obtain formulae for the joint distribtuion of 
(T,Ti, ...,T„) via ( ff (i)/ J B[ 3 (T)])P(7 1 G di,7i e d*i, ...,T n G dt„) using Perman, Pitman and Yor (1992, 
Theoerem 2.1). All of these facts correspond with the PK(p,-f) concept. See additionally Pitman and Yor 
(1997, Proposition 47). 

Combining the description in Pitman (1995b, Lemma 5 and equation(30)) with Theorem 8.1 the next 
result follows. 



Corollary 8.1 Suppose that ttf(JV+E"=i 5 Sji Y*) '■= w(N; Y^j^i ) does not depend on Y* for all n. The 
a joint law of of p, J, Y* is defined as; 



»(p) 



(172) 



E[w{N)} 



<^Eg S S] )V(dN\p,n) 



n(p) 

n h{ Sj r^ P {d Sj \Y*) V {dY*). 



As a consequence, the distribution o/Y| J,T n ( p ),p is conditionally independent o/T„( p ) sitc/i i/ia£ the sequence 
{Yi, . . . , Y„} consists of n(p) unique values {Y*, . . . , Y*, p A which are independent with distribution 

(173) P(dY/|J J ,p)) oc p(J J -|y/)r ? (dY/). 

/or j = 1, . . . , n(p). Additionally the joint distribution of J, p is 



(174) P(Jeds,p) 



1 



w(N;^^ S 8j )P(dN\p,r,) 

(r + E-^^i))" 



£>(iV)] 

// additionally p is homogeneous then 



n(p) 

n ft (^) e 



p(dsj\y)r)(dy). 



n(p) 



(175) 

where, the EPPF is 
(176) p P (ei, . . . ,e„ (p) ) 



P(dY) = p„( ei , . . . , e n(p) ) J] H(dYf), 

3 = 1 



w(N; Y%V> 6 S] )V(dN\p, V ) flf™ h{ Sj ) e - p(ds 



E[w{N)} J MxS m p -> 



(p) 



(r + ES ) M«j)) n 



Moreover for every n, the joint distribution of the unique values of {Yi, . . . , Y„}|p, that is {Y*, . . . , Y*^}, 
are now iid H. IfW(N) := g(T) then the EPPF is; 



(177) 



L 



g (t + Eg } gjjMWgg; M^) e -P(d3 



Tl(p) 



When h(s) = s G (0, oo) and <?(T) = 1, i/ie expression in \17 r \ j corresponds exactly with equation (31) of 
Pitman (1995b). 
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REMARK 41. One could simply apply a Fubini argument to Theorem 8.1 to deduce the existence of the 
relevant joint distributions. However, without any interpretation gained via Pitman, Perman and Yor (1992) 
and Pitman (1995b) the result is somewhat vacuous. Again some of those interpretations may also be of 
interests to practicing Bayesians statisticians as it certainly goes beyond the usual mean/variance assessment 
to compare different random probability measures. 



REMARK 42. The results above serve also to add to the explicit formulae given in Pitman (1995b) 
for the length biased case. Note that one could apply ( |67| ) to obtain alternate representations for the EPPF 
as in Pitman (1995b, Corollary 6). An additional interesting feature of Corollary 8.1 is the conditional 
independence result in (|l 73|) . 



8.2 Mixture models 

Theorem 8.1 and its corollary can certainly handle structures of the form 



(178) 



Q{dN\p,n)Y[ K(Xi\Yi)P(dYd 



A description of the relevant posterior laws is presented below, 

Theorem 8.2 The posterior distribution ofY, P|X based on the model fis]j ) is representable as, 



P(dY,dP|X) oc V(dP\Y) 



n(p) 

Y[TP(dY*\J lP ,X) 



P(Je rfs, P |X), 



where a conditional distribution of Y\ J, p,X is given such that the sequence {Yi, . . . ,Y n } consists of n(p) 
unique values Y* = {Y* , . . . , Y*^} which are independent with respective distributions, 



(179) 



P(dY*|J,p,X) 





11. < /^-v. y;> 


P(d57|J J>P ) 


Sy 


11, r W );: 


P(dY/|J i)P ) 



for j = 1,.. .,n(p). In addition P(Je ds,p|X) oc P(Je ds,p)U"^ f y \l\ ieCj K(Xi\Yf)\ TP{dY*\J 3 ,p) 



It follows that when p does not depend on y (179) does not depend on J,T. Hence in that case posterior 
characterizations of P, Y do not differ in form from the results in Lo (1984) and Ishwaran and James (2001a). 
The general case however is a different mattter entirely. 
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